Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
In the database world, a transaction is a set of commands to be executed at the same time, or rolled back. When people think of transactions, they typically think of bank-level settlement, and along with that, they think “I don’t need that level of complication.” Transactions are not just for bank level databases. To many times people get hung-up thinking about bank deposits and miss real-world use cases for transactions.
Using transactions, programmers save time and reduce the code required of the application that rely on databases. If the database did not manage the transaction state, then it would be required at the application level — which sounds easy, but gets complicated quickly for data with relationships.
For instance, consider a basic signup process creating a user record and an account record, then connecting the user to the account. If you did not have transactions, you would have to program each of the following:
But, with transactions, you can do:
Using transactions means you do not have to remove successful commands due to a failed command. Either they all work, or they all fail.
The simplest transaction is below. It starts the transaction with BEGIN
and finalizes the transaction with COMMIT
.
BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Elizabeth', 'Christensen', current_date, 1, 'Master of the Highwire', 2, 2) RETURNING employee_id;
COMMIT;
This functionality is as-expected, right? It’s similar to running a typical command — we are using the RETURNING
command to prove a point later. To verify the data was committed, now run:
SELECT
*
FROM employees
WHERE
first_name = 'Elizabeth' AND
last_name = 'Christensen';
But, should you chose to ROLLBACK the transaction instead of COMMIT, you’ll find a different outcome:
BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Chris', 'Winslett', current_date, 1, 'Jr Director of the Highwire', 2, 2) RETURNING employee_id;
ROLLBACK;
Using the ROLLBACK
command, the transaction is discarded and not committed to the data timeline of the database. But, it still returns an ID. Now, if you run a select statement to see if the record was saved, you will not find that record:
SELECT
*
FROM employees
WHERE
first_name = 'Chris' AND
last_name = 'Winslett';
But, why did it return an id
value if it was not committed? Why not discard it? The nextval
was triggered on a sequence, which returned a value that was never stored. This prevents two transactions from getting duplicate nextval
and having collisions within your data. Return the command with the rollback above, and you’ll see incrementing values.
The commands COMMIT
and ROLLBACK
are control commands for the transactions. ROLLBACK
always terminates the transaction and discards the changes. COMMIT
, however, is only successful if the transaction is in a not-errored state. Below, we intentionally have a constraint error, but call COMMIT
, which fails:
BEGIN;
INSERT INTO employees (first_name, last_name) VALUES ('Tom', 'Jones') RETURNING employee_id;
COMMIT;
After running the command above, it will show a ROLLBACK
command. So, the COMMIT
command failed because the latest state of the transaction is an error.
Above, we have used simple commands and with those commands transactions seem simple. In the real world, they are much more complex. For instance, building data relationships with transactions means the programmer does not have to unwind itself in the event of a failure. Something like the following:
BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Chris', 'Winslett', current_date, 1, 'Jr Director of the Highwire', 2, 2);
INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('oldest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Chris' AND last_name = 'Winslett'));
INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('youngest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Chris' AND last_name = 'Winslett'));
COMMIT;
If you try to run the transaction of the code-block above a second time, it will fail. That is because there will be 2 values returned for the employee_id
on the dependent statements. And … the employee will not be duplicated. Yaaaa transactions!
Below, we are trying to insert a null value for first_name
on the second dependent, which is a required field on the table. This transaction will fail because a constraint will fail. Below, we leave off the salary, and it aborts the transaction prior to committing:
BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Bob', 'Young', current_date, 1, 'Jr Director of the Highwire', 2, 2);
INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('oldest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
INSERT INTO dependents (first_name, last_name, employee_id) VALUES (null, 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
COMMIT;
After running the above code block, you’ll see ROLLBACK
with the constraint error above. The COMMIT
at the end did not actually write the data to the timeline. Now, if you run the following query, you’ll not see Bob Young in your list of employees:
SELECT *
FROM employees
WHERE
first_name = 'Bob' AND
last_name = 'Young';
This is because the transaction as a whole was not committed. It was all-or-nothing.
What happens in a transaction stays in the transaction, until the transaction is committed. For instance, should you have multiple connections to a database, and one connection has an uncommitted transaction open, then the other connections cannot see the data of the transaction until it is committed.
Postgres is awesome because structure changes are not committed until the transaction is committed. This is unique in the database world. For instance, run the following within the transactions:
BEGIN;
ALTER TABLE employees ADD COLUMN middle_name VARCHAR(50) DEFAULT NULL;
COMMIT;
Now, if you run the following, you’ll see the addition of the middle_name
column.
\d employees
Now, let’s perform a complex structure change that fails because you do not set a default properly:
BEGIN;
ALTER TABLE employees ADD COLUMN address_line_1 VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN address_line_2 VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN city VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN province VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN postal_code VARCHAR(50) NOT NULL;
COMMIT;
Because it was wrapped in a transaction, and because a default value was not provided to the postal_code
alter table. None of the changes were committed, and you will not see the addition of the address columns:
\d employees
These are called Transactional DDL. It has saved me more times than you can count. Other databases which do not perform DDL can leave a migration state half-executed. With Postgres, the migration is either all succeeded or no change.
For certain use cases, people ask for nested transactions. Even better! Postgres has the ability to do SAVEPOINT
within the transaction. A SAVEPOINT
gives a point that can be referred to as part of the rollback. Earlier, when describing how COMMIT
fails if it is in an error state, the SAVEPOINT
functionality is the caveat for reverting the transaction to a healthy state. Let us take our example earlier:
BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Bob', 'Young', current_date, 1, 'Jr Director of the Highwire', 2, 2);
SAVEPOINT saved_employee;
INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('oldest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
INSERT INTO dependents (first_name, last_name, employee_id) VALUES (null, 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
ROLLBACK TO SAVEPOINT saved_employee;
COMMIT;
Now, if you run a query to search for Bob Young, you will find an example, but you will not find the dependents.
SELECT * FROM employees WHERE first_name = 'Bob' AND last_name = 'Young';
But, what happened? We were told transactions are all-or-nothing. Well, they are, unless you use SAVEPOINT
. This is a bit of a contrived example, but these use cases usually become complex and feature PL/PgSQL programming within the database. We wanted you to know SAVEPOINT
exists, but you’ll probably not use them for a while.
Below is a command to determine if you are in a transaction. The following command will show your current transaction id:
SELECT txid_current();
But, wait, we didn’t start a transaction? Ha! Jokes on us! We’ve been using transactions all along! Postgres runs all commands, even simple statements, within a transaction. Run it again, and you’ll see a number incremented by 1.
The next step for transaction is learning how they are used within your programming language, ORMs, or tools. When connecting to the database, different programming languages perform transactions differently, but knowing the pattern exists can reduce complexity of your own code for relationships and migrations.
Loading terminal...
Loading terminal...