Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
Databases and computers go hand in hand. Grocery stores, banks, schools and even your favorite video games all use databases behind the computer application you see. Databases are collections of large amounts of data stored in tables. Tables look like grids with rows and columns. The central information used to look up rows are called keys. Tables usually have data related to each other, connected by those keys.
Postgres is a specific kind of database software for storing and retrieving data. What makes Postgres special is that it is open source software, which means it is written by a community of people and not one central company.
To talk to Postgres you will use two things. One, is called psql
and this is the terminal language used to talk to Postgres about itself.
The second thing is called SQL
and this is how you retreive or change the data inside the Postgres database. SQL stands for Structured Query Language. This is the coding language humans and computers can use to talk to a database. SQL is the language used to talk to Postgres or any other SQL based database.
You'll see next to lines of code, there's a COPY
command. Clicking that will copy that code block. Paste that into the window on the right where the cursor is and press Enter to run the command. These copy statements are a way to show you things that can be done. You can redo them or change things in the commands to make them your own.
Postgres only shows you part of a query to reduce the amount shown on a page. Sometimes when you run something, you’ll see -More--
at the very bottom. Use the arrow up key. Q will quit, space shows you a bit more, and R will show the rest.
We have loaded some data into this database for you. Let's try some simple psql commands.
This will list the tables we have
\d
You will see three tables:
List of relations
Schema | Name | Type | Owner
--------+----------------------+-------+----------
public | insect_names | table | postgres
public | order_descriptions | table | postgres
public | order_species_counts | table | postgres
(3 rows)
If we want to know more about what is inside one of the tables we could look at one more closely, here's we will look at the insect_names
table.
\d+ insect_names
Here we can see that we have many columns including:
common_name scientific_name Order family genus species author notes taxonomy_last_updated
Now if we want to look at the data inside insect_names, we will need some sql. If you don’t know how big your data is, you should never try to look at all of it. Here's a query to select data, but limit it to showing us 10 results. This is a good way to sample data without having to pull everything.
SELECT * FROM insect_names LIMIT 10;
If you wanted to see more or less data, you could change the LIMIT
operator and ask ask for any number of rows, say 2.
SELECT * FROM insect_names LIMIT 2;
You can also just select certain parts of the data, * will give you all the rows, but this statement will just show you the common_names
.
SELECT common_name FROM insect_names LIMIT 10;
One thing that makes sql really powerful is being able to use filters to find the data you want. The WHERE
filter is really easy to use, and you tell the database to give you information where certain information matches what you're looking for. Here in this example, we are selecting the whole row, from insect names, where the common name is like worm.
The ILIKE
is an operator that will help you find names no matter if they are in lowercase or upper case. The % around the worm make sure that worm could be in any part of the common name.
SELECT * FROM insect_names WHERE common_name ILIKE '%worm%';
Want to find a bug named after you? This sql query asks for a bug with the name like Franklin.
SELECT * FROM insect_names WHERE common_name ILIKE '%franklin%';
Now paste it again and see if you can find our name or someone you know.
Let's say for the sake of learning sql today that you want to update the name of a bug to be your name. I'm going to change the name of 'Franklin bumble bee' to 'Lizzie bumble bee'. This is something we do with an UPDATE
statement.
UPDATE insect_names SET common_name ='Lizzie bumble bee'
WHERE scientific_name = 'Bombus franklini Frison';
Ok, now check that your name is there, this is a select statement if my name is Lizzie:
SELECT * FROM insect_names WHERE common_name ILIKE '%lizzie%';
Another important thing you can do with data in SQL is create new things. You create a new row of data in your table with an INSERT statement. What if we create a new kind of bug named after our favorite food. Let's say we want to create a new stick bug. Stick bugs are part of the order 'Phasmatodea', family 'Diapheromeridae', and genus 'denticrus'. We'll create a row that creates:
INSERT INTO insect_names (common_name,
scientific_name,
"Order",
family,
genus,
species,
author,
notes)
VALUES ('pizza bug',
'Megaphasma pizzaii',
'Phasmatodea',
'Diapheromeridae',
'denticrus',
'pizzaii',
'Lizzie',
'this is a test insert statement - not a real insect');
Ok, now we'll check that our new bug is there:
SELECT * FROM insect_names WHERE common_name ILIKE '%pizza%'
You might have noticed when you pasted that query that your code went across several lines, like this
postgres=# INSERT INTO insect_names (common_name,
postgres(# scientific_name,
postgres(# "Order",
postgres(# family,
postgres(# genus,
postgres(# species,
postgres(# author,
postgres(# notes)
postgres-# VALUES ('pizza bug',
postgres(# 'Megaphasma pizzaii',
postgres(# 'Phasmatodea',
postgres(# 'Diapheromeridae',
postgres(# 'denticrus',
postgres(# 'pizzaii',
postgres(# 'Lizzie',
When you’re in a Postgres terminal the semicolon ;
is what ends a statement. If you forget it, you can add it on the next line.
One of the things that makes databases really powerful is that you can connect lots of different tables together. Using data from more than one table is called a join and you'll need something that's common in both tables so the database knows how to relate things to each other.
Let's look at the other tables in our database with:
\dt
We've been talking mostly about the insect_names
table but there's two more in there, order_descriptions
and order_species_counts.
Let's take a peek at them.
SELECT * from order_species_counts limit 2;
SELECT * from order_descriptions limit 2;
So I can see that order_descriptions is telling me the Order names and what kinds of insects are in each group. So let's try to find the Order for butterflies.
SELECT * FROM order_descriptions
WHERE examples ILIKE '%butterflies%';
And using that Order name, Lepidoptera, order_species_counts will tells me how many species there are.
SELECT extant_species_described
FROM order_species_counts
WHERE "Order" ='Lepidoptera';
To combine these two queries into one with a join it would look like this:
SELECT DISTINCT extant_species_described
FROM order_species_counts o
INNER JOIN order_descriptions d
ON (o."Order" = d."Order")
WHERE d.examples ILIKE '%butterflies%';
I am asking about the same count in the species number table. But since I joined it with the order_descriptions table, I can search and get the results all together.
If you look at the query, you’ll see that we give each table an abbreviation order_species_counts o
and order_descriptions d
and then when we talk about that data from that table, we use the abbreviation, d.examples
.
If I want to look up the common names, type_of_mouth_parts and metaphors type for all insects with a common name like butterfly, I can run this. You see how we're able to pull one field, common_name, and two other fields from a second table.
SELECT i.common_name, o.type_of_mouth_parts, o.metamorphosis
FROM order_descriptions o
INNER JOIN insect_names i
ON (o."Order" = i."Order")
WHERE i.common_name ilike '%butterfly%';
Side note about the word “Order”: You might be wondering why when I talked to sql about "Order" I have to put that in quotes? We’ll, order is also a special sql operation for ordering you data in a specific way. We have to quote it so Postgres knows we’re talking about the data and not the operation.
Counting data in sql usually means you are counting the number of rows. So if I want to see how many entries I have in my insect_names table (note, this is not all insects in the whole world, its just a subset) I would do:
SELECT COUNT (*) FROM insect_names;
If you want to use the database to add using mathematics, you will use the SUM function. You can use the SUM function to total up different data in rows. For example, if I want to see the total number of species across all the orders is my order_species_counts table, that would be something like this:
SELECT SUM(extant_species_described) AS total_species
FROM order_species_counts;
You can also combine the SUM with other things we have learned like filtering with WHERE and ILIKE and a join. Here's a query to find the number of species that are either dragonflies or butterflies.
SELECT SUM(s.extant_species_described) AS total_species
FROM order_species_counts s
INNER JOIN order_descriptions o
ON (s."Order" = o."Order")
WHERE o.examples iLIKE '%dragonflies%'
OR o.examples ilike '%butterflies%';
Thanks for sticking with us through this introductory tutorial! You're welcome to stay around and play with this data. It’s running for you, and only you, in your web browser. You can come back at anytime.
A fun next step to take in our playground would be working some of the Advent of Code Tutorials. They solve some puzzles related to the Christmas holidays. You can read about the problems they're trying to solve and follow along with the solutions. The first one starts here.
Loading terminal...
Loading terminal...