In the database world, a transaction is a set of commands to be executed at the same time, or rolled back. When people think of transactions, they typically think of bank-level settlement, and along with that, they think “I don’t need that level of complication.” Transactions are not just for bank level databases. To many times people get hung-up thinking about bank deposits and miss real-world use cases for transactions.
Using transactions, programmers save time and reduce the code required of the application that rely on databases. If the database did not manage the transaction state, then it would be required at the application level — which sounds easy, but gets complicated quickly for data with relationships.
For instance, consider a basic signup process creating a user record and an account record, then connecting the user to the account. If you did not have transactions, you would have to program each of the following:
- Create user
- If not successful, exit and return error message
- Create account
- If not successful, remove user and exit and return error message
- Connect user to account
- If not successful, remove user, account, and exit and return error message
But, with transactions, you can do:
- Start transaction
- Create account
- Create user
- Connect user to account
- Transaction Successful?
- If yes, return success
- If no, return error message
Using transactions means you do not have to remove successful commands due to a failed command. Either they all work, or they all fail.
The Basics
The simplest transaction is below. It starts the transaction with BEGIN and
finalizes the transaction with COMMIT.
BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Elizabeth', 'Christensen', current_date, 1, 'Master of the Highwire', 2, 2) RETURNING employee_id;
COMMIT;
This functionality is as-expected, right? It’s similar to running a typical
command — we are using the RETURNING command to prove a point later. To verify
the data was committed, now run:
SELECT
*
FROM employees
WHERE
first_name = 'Elizabeth' AND
last_name = 'Christensen';
But, should you chose to ROLLBACK the transaction instead of COMMIT, you’ll find a different outcome:
BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Chris', 'Winslett', current_date, 1, 'Jr Director of the Highwire', 2, 2) RETURNING employee_id;
ROLLBACK;
Using the ROLLBACK command, the transaction is discarded and not committed to
the data timeline of the database. But, it still returns an ID. Now, if you run
a select statement to see if the record was saved, you will not find that
record:
SELECT
*
FROM employees
WHERE
first_name = 'Chris' AND
last_name = 'Winslett';
But, why did it return an id value if it was not committed? Why not discard
it? The nextval was triggered on a sequence, which returned a value that was
never stored. This prevents two transactions from getting duplicate nextval
and having collisions within your data. Return the command with the rollback
above, and you’ll see incrementing values.
The commands COMMIT and ROLLBACK are control commands for the transactions.
ROLLBACK always terminates the transaction and discards the changes. COMMIT
, however, is only successful if the transaction is in a not-errored state.
Below, we intentionally have a constraint error, but call COMMIT, which fails:
BEGIN;
INSERT INTO employees (first_name, last_name) VALUES ('Tom', 'Jones') RETURNING employee_id;
COMMIT;
After running the command above, it will show a ROLLBACK command. So, the
COMMIT command failed because the latest state of the transaction is an error.
Real-world Transaction Statement
Above, we have used simple commands and with those commands transactions seem simple. In the real world, they are much more complex. For instance, building data relationships with transactions means the programmer does not have to unwind itself in the event of a failure. Something like the following:
BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Chris', 'Winslett', current_date, 1, 'Jr Director of the Highwire', 2, 2);
INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('oldest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Chris' AND last_name = 'Winslett'));
INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('youngest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Chris' AND last_name = 'Winslett'));
COMMIT;
If you try to run the transaction of the code-block above a second time, it
will fail. That is because there will be 2 values returned for the employee_id
on the dependent statements. And … the employee will not be duplicated. Yaaaa
transactions!
Below, we are trying to insert a null value for first_name on the second
dependent, which is a required field on the table. This transaction will fail
because a constraint will fail. Below, we leave off the salary, and it aborts
the transaction prior to committing:
BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Bob', 'Young', current_date, 1, 'Jr Director of the Highwire', 2, 2);
INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('oldest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
INSERT INTO dependents (first_name, last_name, employee_id) VALUES (null, 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
COMMIT;
After running the above code block, you’ll see ROLLBACK with the constraint
error above. The COMMIT at the end did not actually write the data to the
timeline. Now, if you run the following query, you’ll not see Bob Young in your
list of employees:
SELECT *
FROM employees
WHERE
first_name = 'Bob' AND
last_name = 'Young';
This is because the transaction as a whole was not committed. It was all-or-nothing.
Transaction Scoping
What happens in a transaction stays in the transaction, until the transaction is committed. For instance, should you have multiple connections to a database, and one connection has an uncommitted transaction open, then the other connections cannot see the data of the transaction until it is committed.
Structure Changes in Transactions
Postgres is awesome because structure changes are not committed until the transaction is committed. This is unique in the database world. For instance, run the following within the transactions:
BEGIN;
ALTER TABLE employees ADD COLUMN middle_name VARCHAR(50) DEFAULT NULL;
COMMIT;
Now, if you run the following, you’ll see the addition of the middle_name
column.
\d employees
Now, let’s perform a complex structure change that fails because you do not set a default properly:
BEGIN;
ALTER TABLE employees ADD COLUMN address_line_1 VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN address_line_2 VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN city VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN province VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN postal_code VARCHAR(50) NOT NULL;
COMMIT;
Because it was wrapped in a transaction, and because a default value was not
provided to the postal_code alter table. None of the changes were committed,
and you will not see the addition of the address columns:
\d employees
These are called Transactional DDL. It has saved me more times than you can count. Other databases which do not perform DDL can leave a migration state half-executed. With Postgres, the migration is either all succeeded or no change.
Advanced Transactions: SAVEPOINT
For certain use cases, people ask for nested transactions. Even better! Postgres
has the ability to do SAVEPOINT within the transaction. A SAVEPOINT gives a
point that can be referred to as part of the rollback. Earlier, when describing
how COMMIT fails if it is in an error state, the SAVEPOINT functionality is
the caveat for reverting the transaction to a healthy state. Let us take our
example earlier:
BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Bob', 'Young', current_date, 1, 'Jr Director of the Highwire', 2, 2);
SAVEPOINT saved_employee;
INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('oldest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
INSERT INTO dependents (first_name, last_name, employee_id) VALUES (null, 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
ROLLBACK TO SAVEPOINT saved_employee;
COMMIT;
Now, if you run a query to search for Bob Young, you will find an example, but you will not find the dependents.
SELECT * FROM employees WHERE first_name = 'Bob' AND last_name = 'Young';
But, what happened? We were told transactions are all-or-nothing. Well, they
are, unless you use SAVEPOINT. This is a bit of a contrived example, but these
use cases usually become complex and feature PL/PgSQL programming within the
database. We wanted you to know SAVEPOINT exists, but you’ll probably not use
them for a while.
Am I in a transaction?
Below is a command to determine if you are in a transaction. The following command will show your current transaction id:
SELECT txid_current();
But, wait, we didn’t start a transaction? Ha! Jokes on us! We’ve been using transactions all along! Postgres runs all commands, even simple statements, within a transaction.
Run it again, and you’ll see a number incremented by 1.
Next Steps
The next step for transaction is learning how they are used within your programming language, ORMs, or tools. When connecting to the database, different programming languages perform transactions differently, but knowing the pattern exists can reduce complexity of your own code for relationships and migrations.