Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
SQL is a simple language used for retrieving and updating data in a database. A SQL statement is constructed based on what we call “clauses”. We assemble SQL clauses together to return, manipulate, filter, and update data.
The simplest SQL statement is:
SELECT 1;
This tutorial is interactive, run the command in the Postgres shell to the right. You’ll see that result returns a single value of “1”. What if we changed it to:
SELECT 'two';
Now, you’ll see that it returns a string with the value ‘two’. This clause we are running is the SELECT
clause. Notice, all SQL statements are finalized by a semicolon: ;
.
We have populated this database with a table called “regions” that houses a few regions around the world. To return data from the database, then we will use a FROM
clause:
SELECT regions.* FROM regions;
Once we append the FROM regions
clause it means we now have access to the values in the regions table. The *
can be through of as “return all columns on this table.” If we only want to return a single column, we can be explicit about the values in the SELECT
clause:
SELECT regions.region_name FROM regions;
Next, we can use a WHERE
clause to restrict the filter to certain regions:
SELECT regions.* FROM regions WHERE regions.region_id = 1;
Experiment changing the values of the WHERE
conditional to see how the different values work. In the where clause, you have access to all types of comparison operators, include greater than, less than, equal, contained in a list, etc:
SELECT regions.* FROM regions WHERE regions.region_id < 2;
SELECT regions.* FROM regions WHERE regions.region_id > 2;
SELECT regions.* FROM regions WHERE regions.region_id >= 2;
SELECT regions.* FROM regions WHERE regions.region_name IN ('Asia', 'Americas');
Additionally, you can use AND
and OR
logic to create complex conditionals (this SQL statement is getting larger, so let’s break it into multiple lines — notice how we break it along the clauses):
SELECT
regions.*
FROM regions
WHERE regions.region_id >= 2
AND regions.region_name IN ('Asia', 'Americas');
SQL gives us the ability to append a sorting clause called ORDER BY
:
SELECT
regions.*
FROM regions
ORDER BY regions.region_name;
And, with that sorting clause, we can reverse order (the DESC
means descending, if not specified, ordering is assumed to be ASC
or ascending):
SELECT
regions.*
FROM regions
ORDER BY regions.region_name DESC;
Because SQL is built on clauses, we can assemble the ORDER BY
clause together with the prior WHERE
clause:
SELECT
regions.*
FROM regions
WHERE regions.region_id >= 2
AND regions.region_name IN ('Asia', 'Americas')
ORDER BY region_name;
See how these SQL clauses are used to assemble an entire statement? SQL is really just assembling clauses together. If you only want to return a specific number of rows, then we can use the LIMIT
clause:
SELECT
regions.*
FROM regions
LIMIT 1;
As with all clauses, the LIMIT
clause can be integrated with other clauses:
SELECT
regions.*
FROM regions
WHERE regions.region_id >= 2
AND regions.region_name IN ('Asia', 'Americas')
ORDER BY region_name
LIMIT 1;
Let me just take a second to say: SQL requires your clauses to be ordered properly. So far, we have been using a regions
table that has only an region_id
and a region_name
field. SQL databases are not static. Far from it, we can easily manipulate and update the tables. Guess how we manipulate the tables? We also have SQL clauses for that. Below, we will use the ALTER TABLE
and ADD COLUMN
clauses:
ALTER TABLE regions
ADD COLUMN region_population FLOAT DEFAULT NULL;
COMMENT ON COLUMN regions.region_population IS 'population in billions';
Excellent, run that command above, and it will add new column to the regions
table. We have also used the COMMENT
clause to add metadata about the column to the schema, this is optional but helpful when you have potentially ambiguous data. Now, if you run SELECT * FROM regions;
you’ll see this new column, but it will be empty. Next, let’s use an UPDATE
statement to update the population for each of the regions:
UPDATE regions
SET region_population = 1.002
WHERE regions.region_name = 'Americas';
Above, you’ll see that the UPDATE
statements use the SET
and WHERE
clauses. The SET
clause specifies the values to set on the updated table and the WHERE
clause restricts which rows the values are set for. UPDATE
statements are powerful, and can change the values for many rows all at once. For instance, what if we left off the WHERE
clause above and ran the following it would update all rows:
UPDATE regions
SET region_population = 1.002;
The response in the terminal is UPDATE 4
, which means we updated 4 ROWS! Uh oh … now if you run SELECT * FROM regions
, you’ll see the same value is set for all regions (that’s ok, we can fix that):
UPDATE regions
SET region_population = 1.002
WHERE regions.region_name = 'Americas';
UPDATE regions
SET region_population = 0.493
WHERE regions.region_name = 'Middle East and Africa';
UPDATE regions
SET region_population = 4.561
WHERE regions.region_name = 'Asia';
UPDATE regions
SET region_population = 0.7464
WHERE regions.region_name = 'Europe';
As you can see from usage of the ALTER TABLE
, COMMENT
, and UPDATE
statements we are assembling clauses together in fairly predictable ways. Now, let’s go back to the SELECT
statement and see what we can do with this new data that we’ve added.
Using SQL, we can manipulate values as well. For instance, we can SUM
all of the values returned by a column:
SELECT sum(regions.region_population)
FROM regions;
This will sum all populations for all regions, but we can use the WHERE
conditional to sum the values of Europe, Asia, Middle East, and Africa:
SELECT sum(regions.region_population)
FROM regions
WHERE regions.region_name IN ('Middle East and Aftrica', 'Asia', 'Europe');
I see that we are missing Australia from the list of regions! Let us add Australia using an INSERT
statement:
INSERT INTO regions (region_name, region_population)
VALUES ('Australia', 0.02569);
Now, when running SELECT * FROM regions;
you’ll see Australia added to the list.
SQL is just a statement with a series of clauses appended to the statement to modify behavior. The SQL statements that we covered are just a small number of the total number of statements available:
Within SQL statements, we have a list of clauses, and those clauses are sometimes shared across different statements. The clauses we used were:
Loading terminal...
Loading terminal...