Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
Over and over when I look at applications for performance, the lesson I learn and re-learn is, do more things right inside the database.
Create, read, update, delete! All the things you do to a table or collection of tables to work with your ever-changing data.
Most CRUD examples, and most CRUD thinking, tend to focus on one table at a time. That's easy to understand. It's also unrealistic. Even the simplest application will be working with several interlinked normalized tables.
Here's our working example tables.
Get started by creating some tables.
DROP TABLE customers, invoices, items;
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE invoices (
invoice_id SERIAL PRIMARY KEY,
customer_id BIGINT REFERENCES customers (customer_id)
);
CREATE TABLE items (
item_id SERIAL PRIMARY KEY,
invoice_id BIGINT REFERENCES invoices (invoice_id),
name TEXT NOT NULL
);
Use of SQL for data management often begins and ends with SQL commands that work on only one table at a time. For example, populating our customers
table.
INSERT INTO customers (name) VALUES ('Ben');
Did you know you can populate multiple records from a single insert statement? This has a bunch of performance benefits:
INSERT INTO customers (name)
VALUES ('Peter'), ('Paul'), ('Mary');
For updates and deletes, again the common practice is to work a row at a time, a table at a time.
UPDATE customers
SET name = 'Jen'
WHERE name = 'Ben';
DELETE FROM customers
WHERE name = 'Jen';
One of the complexities/annoyances of filling out a normalized relational model is getting the keys to neatly match up across multiple tables.
Usually the model uses a serial
key to auto-populate the primary keys, but that means the fresh keys for new rows have to be read back somehow when creating other rows that reference the key.
Here's a query that both creates an invoice, and also adds in some items that refer to that invoice, in a single SQL call.
Note the use of the RETURNING clause, to get back a result set of, in this case, all the newly created invoice_id
keys.
WITH new_invoice AS (
/* Add a fresh invoice and return the newly created id */
INSERT INTO invoices (customer_id)
SELECT customer_id FROM customers WHERE name = 'Mary'
RETURNING invoice_id
)
/* Add the items, using the new invoice id */
INSERT INTO items (name, invoice_id)
SELECT n.name, new_invoice.invoice_id
FROM new_invoice
CROSS JOIN
(VALUES ('Purple Automobile'),
('Yellow Automobile')) AS n(name);
I'm going to leverage RETURNING
and arrays to quickly fill in data for all the customers.
Crazy SQL to Populate Tables
WITH i AS (
/* Insert three new invoices for each customer */
/* returning the invoice_id for each one */
INSERT INTO invoices (customer_id)
SELECT customer_id
FROM customers
CROSS JOIN generate_series(1,3)
RETURNING invoice_id
)
/* Insert three new items for each invoice */
/* Each items is a "colored vehicle", with a */
/* distinct color for each item on an invoice, */
/* and a single kind of vehicle for each invoice */
INSERT INTO items (invoice_id, name)
SELECT i.invoice_id,
Format('%s %s',
c,
(ARRAY['Train', 'Plane', 'Automobile'])[i.invoice_id % 3 + 1]) AS name
FROM unnest(ARRAY['Red', 'Blue', 'Green']) AS c
CROSS JOIN i;
Joining together the tables to read out results can be done really tersely with the PostgreSQL USING
clause for the joins. The USING
clause is available when the source and target join tables use the same column name for the join key.
SELECT *
FROM customers
JOIN invoices USING (customer_id)
JOIN items USING (invoice_id)
WHERE customers.name = 'Paul'
ORDER BY customers.name, invoice_id;
We can see from this query that Paul is really into automobiles, and that he buys one of each color, on each invoice.
invoice_id | customer_id | name | item_id | name
------------+-------------+------+---------+------------------
2 | 2 | Paul | 11 | Blue Automobile
2 | 2 | Paul | 2 | Red Automobile
2 | 2 | Paul | 20 | Green Automobile
5 | 2 | Paul | 14 | Blue Automobile
5 | 2 | Paul | 5 | Red Automobile
5 | 2 | Paul | 23 | Green Automobile
8 | 2 | Paul | 8 | Red Automobile
8 | 2 | Paul | 26 | Green Automobile
8 | 2 | Paul | 17 | Blue Automobile
In fact, Peter, Paul and Mary are all collecting different kinds of vehicles.
SELECT DISTINCT
customers.name,
split_part(items.name, ' ', 2) AS vehicle
FROM customers
JOIN invoices USING (customer_id)
JOIN items USING (invoice_id);
name | vehicle
-------+------------
Paul | Automobile
Peter | Plane
Mary | Train
Do you need to do joins to get these answers? No, you could pull all the records out into your client program and summarize them there with client logic, but it would be a lot slower.
Keeping the logic inside the database allows the system to plan the fastest execution of the query, and avoids a lot of network transport overhead and chatter, which is very expensive.
OK, now we want to change the color of all Mary's blue items to purple!
The "one table at a time" method might be:
customers
table to get Mary's customer_id
invoices
table to get all the invoice_id
associated with Maryitems
table to flip all the blue items to purple for Mary's invoices.But that's three queries! Just intuitively, it will be slower than one query, so what does a single query look like?
/* Target table to change */
UPDATE items
/* Change to apply to target rows */
SET name = replace(items.name, 'Blue', 'Purple')
/* Other relations to use in finding target rows */
FROM customers, invoices
/* Restriction on relations to find just target rows */
WHERE customers.customer_id = invoices.customer_id
AND invoices.invoice_id = items.invoice_id
AND customers.name = 'Mary'
AND items.name ~ '^Blue';
The main challenge is relating the items
to the customer "Mary" which is way over in the customers
table. But by adding the customers
and invoices
table, we can build a path from "Mary" to her items, and then restrict the target rows to just the blue items.
Peter has gone off the color red, so we are going to remove the red items from just his invoices.
Just as with the update problem, we could multi-step the problem: find Peter's customer_id
, find associated invoices, find associated items, delete the red ones. But that would be silly, it can all be done in one step.
DELETE FROM items
USING invoices, customers
WHERE items.invoice_id = invoices.invoice_id
AND invoices.invoice_id = customers.customer_id
AND customers.name = 'Peter'
AND items.name ~ 'Red';
The SQL keywords are a little different (DELETE
specifies non-target relations with the USING
keyword, while UPDATE
uses FROM
) but the principle is identical.
Identify the relations needed to connect the filters you want to apply (in this case, a customer of "Peter" and an item name that starts with "Red") and then join those relations together via their foreign keys in the WHERE
clause.
Loading terminal...
Loading terminal...