Transactions

In the database world, a transaction is a set of commands to be executed at the same time, or rolled back. When people think of transactions, they typically think of bank-level settlement, and along with that, they think “I don’t need that level of complication.” Transactions are not just for bank level databases. To many times people get hung-up thinking about bank deposits and miss real-world use cases for transactions.

Using transactions, programmers save time and reduce the code required of the application that rely on databases. If the database did not manage the transaction state, then it would be required at the application level — which sounds easy, but gets complicated quickly for data with relationships.

For instance, consider a basic signup process creating a user record and an account record, then connecting the user to the account. If you did not have transactions, you would have to program each of the following:

Create user
If not successful, exit and return error message
Create account
If not successful, remove user and exit and return error message
Connect user to account
If not successful, remove user, account, and exit and return error message

But, with transactions, you can do:

Start transaction
- Create account
- Create user
- Connect user to account
Transaction Successful?
- If yes, return success
- If no, return error message

Using transactions means you do not have to remove successful commands due to a failed command. Either they all work, or they all fail.

The Basics

The simplest transaction is below. It starts the transaction with BEGIN and finalizes the transaction with COMMIT.


BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Elizabeth', 'Christensen', current_date, 1, 'Master of the Highwire', 2, 2) RETURNING employee_id;
COMMIT;

This functionality is as-expected, right? It’s similar to running a typical command — we are using the RETURNING command to prove a point later. To verify the data was committed, now run:


SELECT 
	*
FROM employees 
WHERE 
	first_name = 'Elizabeth' AND 
	last_name = 'Christensen';

But, should you chose to ROLLBACK the transaction instead of COMMIT, you’ll find a different outcome:


BEGIN;
INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Chris', 'Winslett', current_date, 1, 'Jr Director of the Highwire', 2, 2) RETURNING employee_id;
ROLLBACK;

Using the ROLLBACK command, the transaction is discarded and not committed to the data timeline of the database. But, it still returns an ID. Now, if you run a select statement to see if the record was saved, you will not find that record:


SELECT 
	* 
FROM employees
WHERE 
	first_name = 'Chris' AND 
	last_name = 'Winslett';

But, why did it return an id value if it was not committed? Why not discard it? The nextval was triggered on a sequence, which returned a value that was never stored. This prevents two transactions from getting duplicate nextval and having collisions within your data. Return the command with the rollback above, and you’ll see incrementing values.

The commands COMMIT and ROLLBACK are control commands for the transactions. ROLLBACK always terminates the transaction and discards the changes. COMMIT , however, is only successful if the transaction is in a not-errored state. Below, we intentionally have a constraint error, but call COMMIT, which fails:


BEGIN;
	INSERT INTO employees (first_name, last_name) VALUES ('Tom', 'Jones') RETURNING employee_id;
COMMIT;

After running the command above, it will show a ROLLBACK command. So, the COMMIT command failed because the latest state of the transaction is an error.

Real-world Transaction Statement

Above, we have used simple commands and with those commands transactions seem simple. In the real world, they are much more complex. For instance, building data relationships with transactions means the programmer does not have to unwind itself in the event of a failure. Something like the following:


BEGIN;
  INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Chris', 'Winslett', current_date, 1, 'Jr Director of the Highwire', 2, 2);
  INSERT INTO dependents (first_name, last_name, employee_id)	VALUES ('oldest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Chris' AND last_name = 'Winslett'));
  INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('youngest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Chris' AND last_name = 'Winslett'));
COMMIT;

If you try to run the transaction of the code-block above a second time, it will fail. That is because there will be 2 values returned for the employee_id on the dependent statements. And … the employee will not be duplicated. Yaaaa transactions!

Below, we are trying to insert a null value for first_name on the second dependent, which is a required field on the table. This transaction will fail because a constraint will fail. Below, we leave off the salary, and it aborts the transaction prior to committing:


BEGIN;
  INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Bob', 'Young', current_date, 1, 'Jr Director of the Highwire', 2, 2);
  INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('oldest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
  INSERT INTO dependents (first_name, last_name, employee_id) VALUES (null, 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
COMMIT;

After running the above code block, you’ll see ROLLBACK with the constraint error above. The COMMIT at the end did not actually write the data to the timeline. Now, if you run the following query, you’ll not see Bob Young in your list of employees:


SELECT * 
FROM employees 
WHERE 
	first_name = 'Bob' AND 
	last_name = 'Young';

This is because the transaction as a whole was not committed. It was all-or-nothing.

Transaction Scoping

What happens in a transaction stays in the transaction, until the transaction is committed. For instance, should you have multiple connections to a database, and one connection has an uncommitted transaction open, then the other connections cannot see the data of the transaction until it is committed.

Structure Changes in Transactions

Postgres is awesome because structure changes are not committed until the transaction is committed. This is unique in the database world. For instance, run the following within the transactions:


BEGIN;
ALTER TABLE employees ADD COLUMN middle_name VARCHAR(50) DEFAULT NULL;
COMMIT;

Now, if you run the following, you’ll see the addition of the middle_name column.


\d employees

Now, let’s perform a complex structure change that fails because you do not set a default properly:


BEGIN;
ALTER TABLE employees ADD COLUMN address_line_1 VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN address_line_2 VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN city VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN province VARCHAR(50) DEFAULT NULL;
ALTER TABLE employees ADD COLUMN postal_code VARCHAR(50) NOT NULL;
COMMIT;

Because it was wrapped in a transaction, and because a default value was not provided to the postal_code alter table. None of the changes were committed, and you will not see the addition of the address columns:


\d employees

These are called Transactional DDL. It has saved me more times than you can count. Other databases which do not perform DDL can leave a migration state half-executed. With Postgres, the migration is either all succeeded or no change.

Advanced Transactions: SAVEPOINT

For certain use cases, people ask for nested transactions. Even better! Postgres has the ability to do SAVEPOINT within the transaction. A SAVEPOINT gives a point that can be referred to as part of the rollback. Earlier, when describing how COMMIT fails if it is in an error state, the SAVEPOINT functionality is the caveat for reverting the transaction to a healthy state. Let us take our example earlier:


BEGIN;
  INSERT INTO employees (first_name, last_name, start_date, salary, job_title, manager_id, department_id) VALUES ('Bob', 'Young', current_date, 1, 'Jr Director of the Highwire', 2, 2);
  SAVEPOINT saved_employee;
  INSERT INTO dependents (first_name, last_name, employee_id) VALUES ('oldest', 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
  INSERT INTO dependents (first_name, last_name, employee_id) VALUES (null, 'kid', (SELECT employee_id FROM employees WHERE first_name = 'Bob' AND last_name = 'Young'));
  ROLLBACK TO SAVEPOINT saved_employee;
COMMIT;

Now, if you run a query to search for Bob Young, you will find an example, but you will not find the dependents.


SELECT * FROM employees WHERE first_name = 'Bob' AND last_name = 'Young';

But, what happened? We were told transactions are all-or-nothing. Well, they are, unless you use SAVEPOINT. This is a bit of a contrived example, but these use cases usually become complex and feature PL/PgSQL programming within the database. We wanted you to know SAVEPOINT exists, but you’ll probably not use them for a while.

Am I in a transaction?

Below is a command to determine if you are in a transaction. The following command will show your current transaction id:


SELECT txid_current();

But, wait, we didn’t start a transaction? Ha! Jokes on us! We’ve been using transactions all along! Postgres runs all commands, even simple statements, within a transaction. Run it again, and you’ll see a number incremented by 1.

Next Steps

The next step for transaction is learning how they are used within your programming language, ORMs, or tools. When connecting to the database, different programming languages perform transactions differently, but knowing the pattern exists can reduce complexity of your own code for relationships and migrations.

Loading terminal...