Prepared Statements in Transaction Mode for PgBouncer
Version 1.21 of PgBouncer, the Postgres connection pooler, has added a long awaited feature: support for prepared statements inside of transaction mode. Prior to this, one had to choose between using prepared statements (a performance win), and using PgBouncer's transaction mode (also a large performance win). Now, we can have our cake and eat it too 🎂 🎉.
What are prepared statements?
In Postgres, every SQL command you send to the server is parsed, prepared, and then executed. If you are running the exact same query multiple times, or even a query that only varies by literal values, you can create a prepared statement. Once created, you can execute the prepared statement over and over, removing the parsing and preparing steps. We can demonstrate this in action by using direct SQL commands. Let's use a typical pgbench table, and find the top 3 largest accounts over $4000:
SELECT aid, abalance FROM pgbench_accounts WHERE abalance > 4000
ORDER BY abalance DESC FETCH FIRST 3 ROWS WITH TIES;
aid | abalance
--------+----------
993613 | 7974
18995 | 4992
885701 | 4985
(3 rows)
That $4000 dollar value may change, so we make that a parameter and use the PREPARE command to create a prepared statement named "foo". Then we can simply execute this over and over, by only passing the name of the prepared statement and its parameters:
PREPARE foo AS SELECT aid, abalance FROM pgbench_accounts
WHERE abalance > $1 ORDER BY abalance DESC FETCH FIRST 3 ROWS WITH TIES;
EXECUTE foo(4000);
aid | abalance
--------+----------
993613 | 7974
18995 | 4992
885701 | 4985
(3 rows)
EXECUTE foo(5000);
aid | abalance
--------+----------
993613 | 7974
(1 row)
You can always view your current prepared statements with the pg_prepared_statements view:
SELECT * FROM pg_prepared_statements;
name | foo
statement | PREPARE foo AS SELECT aid, abalance FROM pgbench_accounts
WHERE abalance > $1 ORDER BY abalance DESC FETCH FIRST 3 ROWS WITH TIES;
prepare_time | 2023-10-17 12:34:56.789-04
parameter_types | {integer}
result_types | {integer,integer}
from_sql | t
generic_plans | 0
custom_plans | 2
To reiterate the advantages of prepared statements:
- The query is only parsed once
- The query is only planned once
- A much smaller amount of information is sent over the network each time
- The parameters are automatically escaped
How much faster are prepared statements?
What sort of performance gain can you expect? It depends quite a bit on how complex your query is, and thus how much work Postgres has to spend in the parse/prepare stages. We can see a stark difference if we craft a query specially designed to make the planner work hard. We can simply do this by joining the same table to itself 15 times. Here's the query, along with the results of running it a few times via EXPLAIN (ANALYZE)
EXPLAIN (analyze)
SELECT c1.* FROM pg_class c1
JOIN pg_class c2 USING (oid)
JOIN pg_class c3 USING (oid)
JOIN pg_class c4 USING (oid)
JOIN pg_class c5 USING (oid)
JOIN pg_class c6 USING (oid)
JOIN pg_class c7 USING (oid)
JOIN pg_class c8 USING (oid)
JOIN pg_class c9 USING (oid)
JOIN pg_class c10 USING (oid)
JOIN pg_class c11 USING (oid)
JOIN pg_class c12 USING (oid)
JOIN pg_class c13 USING (oid)
JOIN pg_class c14 USING (oid)
JOIN pg_class c15 USING (oid);
## First run gives:
Planning Time: 174.896 ms
Execution Time: 31.923 ms
## Second run:
Planning Time: 172.350 ms
Execution Time: 12.500 ms
## Third run:
Planning Time: 167.417 ms
Execution Time: 12.599 ms
We can see here that the amount of time to plan this query far exceeds the actual time for it to run. Let's see what happens if we turn it into a prepare statement and execute that a few times:
PREPARE myps AS
SELECT c1.* FROM pg_class c1
JOIN pg_class c2 USING (oid)
JOIN pg_class c3 USING (oid)
JOIN pg_class c4 USING (oid)
JOIN pg_class c5 USING (oid)
JOIN pg_class c6 USING (oid)
JOIN pg_class c7 USING (oid)
JOIN pg_class c8 USING (oid)
JOIN pg_class c9 USING (oid)
JOIN pg_class c10 USING (oid)
JOIN pg_class c11 USING (oid)
JOIN pg_class c12 USING (oid)
JOIN pg_class c13 USING (oid)
JOIN pg_class c14 USING (oid)
JOIN pg_class c15 USING (oid);
EXPLAIN (analyze) EXECUTE myps;
EXPLAIN (analyze) EXECUTE myps;
EXPLAIN (analyze) EXECUTE myps;
## The PREPARE itself is very quick, as no actual parsing/preparing is done yet:
PREPARE
Time: 2.887 ms
## The first time we run it, we incur the planning cost:
Planning Time: 169.229 ms
Execution Time: 13.161 ms
## The second time, however, is blazingly fast:
Planning Time: 0.020 ms
Execution Time: 11.164 ms
## The third and all future runs will also be fast:
Planning Time: 0.019 ms
Execution Time: 11.854 ms
We can see from the above that prepared statements can definitely be a large performance win. This is not limited to test cases like the above: as your queries grow and get more complex, the more time will be spent planning them, and the more benefits the use of prepared statements will provide.
What is PgBouncer's transaction mode?
With most connection poolers, a connection to the database is opened once, and then an application maintains that pooled connection until it disconnects, at which point another application is free to use that already-opened connection to the database. The only advantage of this system is that it prevents Postgres from constantly creating and destroying backend processes. This is still a big advantage, and PgBouncer can support this in its "session mode". However, it also has a "transaction mode", in which the connection to a specific Postgres backend is only maintained as long as a transaction is active. So the connection is returned to the pool for other clients to use as soon as the client does a COMMIT or ROLLBACK. This allows for enormous scaling, in which a handful of PgBouncer connections to Postgres can support thousands of connections from PgBouncer client applications.
Why prepared statements can be a problem in transaction mode
Before version 1.21 of PgBouncer, it was not possible to use prepared statements with transaction mode. This was because prepared statements live at the session level, not the transaction level. In other words, if your client creates a prepared statement, then starts a new transaction, there is no guarantee that the client will be connected to the same Postgres backend as before. Because only the original Postgres backend contains the prepared statement, you would receive errors like this when you try and execute your prepared statement:
ERROR: prepared statement "foo" does not exist
We can demonstrate both the old and new behavior with a small Perl script, and making sure that this new feature is NOT enabled by setting max_prepared_statements=0 in the pgbouncer config file.
#!/usr/bin/env perl
use Modern::Perl;
use DBD::Pg 3.17.0;
## Connect to PgBouncer twice:
my $DSN = "DBI:Pg:dbname=postgres;port=6432";
my $dbh1 = DBI->connect($DSN, '', '', {RaiseError=>1,PrintError=>0});
my $dbh2 = DBI->connect($DSN, '', '', {RaiseError=>1,PrintError=>0});
## New feature in version 3.17.0 of DBD::Pg to prevent manual DEALLOCATE
$dbh1->{pg_skip_deallocate} = 1; $dbh2->{pg_skip_deallocate} = 1;
## A little bit about who we are:
my $sversion = $dbh1->{pg_server_version};
say "DBI version $DBI::VERSION, DBD::Pg version $DBD::Pg::VERSION, Postgres version $sversion";
## Start a transaction on each side, and report our Postgres backend PID:
$dbh1->begin_work(); $dbh2->begin_work();
my $PID = 'SELECT pg_backend_pid()';
printf "Connection 1: backend PID is %s\n", $dbh1->selectall_arrayref($PID)->[0][0];
printf "Connection 2: backend PID is %s\n", $dbh2->selectall_arrayref($PID)->[0][0];
## Prepare a statement only on connection one, then execute it 10 times:
my $sth1 = $dbh1->prepare('SELECT count(*) FROM pg_class WHERE reltuples > ?');
$sth1->execute($_) for 1..10;
## Show the list of prepared statements on each connection:
my $SQL = "SELECT name, statement FROM pg_prepared_statements WHERE statement ~ 'count'";
for my $row ($dbh1->selectall_arrayref($SQL)->@*) {
printf "Connection 1 prepared statement %s: %s\n", $row->[0], $row->[1];
}
for my $row ($dbh2->selectall_arrayref($SQL)->@*) {
printf "Connection 2 prepared statement %s: %s\n", $row->[0], $row->[1];
}
say 'Forcing client connection to switch their backend PIDs';
$dbh1->commit(); $dbh2->commit();
$dbh2->begin_work(); $dbh1->begin_work();
printf "Connection 1: backend PID is %s\n", $dbh1->selectall_arrayref($PID)->[0][0];
printf "Connection 2: backend PID is %s\n", $dbh2->selectall_arrayref($PID)->[0][0];
for my $row ($dbh1->selectall_arrayref($SQL)->@*) {
printf "Connection 1 prepared statement %s: %s\n", $row->[0], $row->[1];
}
for my $row ($dbh2->selectall_arrayref($SQL)->@*) {
printf "Connection 2 prepared statement %s: %s\n", $row->[0], $row->[1];
}
## This will fail unless prepared statements are supported in PgBouncer:
$sth1->execute(123);
say 'Success!';
for my $row ($dbh1->selectall_arrayref($SQL)->@*) {
printf "Connection 1 prepared statement %s: %s\n", $row->[0], $row->[1];
}
$sth1->finish(); $dbh1->disconnect(); $dbh2->disconnect();
This produces the following output. Look closely at the PIDs and the error statements:
DBI version 1.644, DBD::Pg version 3.17.0, Postgres version 160000
Connection 1: backend PID is 14689
Connection 2: backend PID is 14690
Connection 1 prepared statement dbdpg_p14671_1: SELECT count(*) FROM pg_class WHERE reltuples > $1
Forcing client connection to switch their backend PIDs
Connection 1: backend PID is 14690
Connection 2: backend PID is 14689
Connection 2 prepared statement dbdpg_p14671_1: SELECT count(*) FROM pg_class WHERE reltuples > $1
DBD::Pg::st execute failed: ERROR: prepared statement "dbdpg_p14671_1" does not exist
In our example above, DBD::Pg prepared a statement named dbdpg_p14671_1 and sent it to the server at the protocol level. When Connection 1 got connected to the other Postgres backend, that statement was no longer there, and we got an error. However, if PgBouncer is version 1.21 or newer, we can set max_prepared_statements = 10 in the pgbouncer.ini file, restart it, and get a very different outcome when we run the same script again:
DBI version 1.644, DBD::Pg version 3.17.0, Postgres version 160000
Connection 1: backend PID is 15017
Connection 2: backend PID is 15018
Connection 1 prepared statement PGBOUNCER_1: SELECT count(*) FROM pg_class WHERE reltuples > $1
Forcing client connection to switch their backend PIDs
Connection 1: backend PID is 15018
Connection 2: backend PID is 15017
Connection 2 prepared statement PGBOUNCER_1: SELECT count(*) FROM pg_class WHERE reltuples > $1
Connection 1 prepared statement PGBOUNCER_1: SELECT count(*) FROM pg_class WHERE reltuples > $1
Success!
No errors! Also note that PgBouncer has intercepted our prepared statements and renamed them. It will also automatically create them as needed. As far as the client is concerned, prepared statements are already there as expected. In reality, they may or may not exist on that backend when we call execute, but PgBouncer quickly creates them if they are not there.
Caveat one: only protocol level things
The first caveat is that this only applies to protocol-level prepared statements. For most languages, this means that we are using the libpq function PQprepare rather than issuing a SQL command such as PREPARE foo AS ...
This is an important distinction as the former sends specific protocol-level commands to Postgres that PgBouncer can intercept and act upon, while the latter simply sends a string of text over the wire, which PgBouncer knows nothing about.
Caveat two: beware of deallocation
Unfortunately, while the protocol supports destroying prepared statements, there is no libpq interface to do so until the PQclosePrepared function of Postgres 17 (sharp readers may realize that this version does not exist yet, but will around October of 2024). Hence, many drivers and programs will resort to doing a manual DEALLOCATE. The problem with mixing and matching protocol-level and direct commands is that PgBouncer is intercepting and modifying the names of our prepared statements, but the driver has no way to know this, so the allocation fails with an error like this:
(in cleanup) DBD::Pg::st DESTROY failed: ERROR: prepared statement "dbdpg_p15470_1" does not exist
I helped to review and test this PgBouncer patch, and as part of that released version DBD::Pg 3.17.0 to explicitly get around this problem with the pg_skip_deallocate attribute. Your favorite language may or may not have this deallocation problem: try it and see!
Conclusion
Many production clients across the Postgres landscape use the PgBouncer tool to assist with scaling connections. Adding prepared statements to your application can be a huge performance win and I'm super excited to see this work in conjunction with connection pooling. We're already running this version of PgBouncer in Crunchy Bridge and will soon have it released on our other product lines.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read