Just Upgrade: How PostgreSQL 12 Can Improve Your Performance
PostgreSQL 12, the latest version of the "world's most advanced open source relational database," is being released in the next few weeks, barring any setbacks. This follows the project's cadence of providing a raft of new database features once a year, which is quite frankly, amazing and one of the reasons why I wanted to be involved in the PostgreSQL community.
In my opinion, and this is a departure from previous years, PostgreSQL 12 does not contain one or two single features that everyone can point to and say that "this is the 'FEATURE' release," (partitioning and query parallelism are recent examples that spring to mind). I've half-joked that the theme of this release should be "PostgreSQL 12: Now More Stable" -- which of course is not a bad thing when you are managing mission critical data for your business.
And yet, I believe this release is a lot more than that: many of the features and enhancements in PostgreSQL 12 will just make your applications run better without doing any work other than upgrading!
(...and maybe rebuild your indexes, which, thanks to this release, is not as painful as it used to be)!
It can be quite nice to upgrade PostgreSQL and see noticeable improvements without having to do anything other than the upgrade itself. A few years back when I was analyzing an upgrade of PostgreSQL 9.4 to PostgreSQL 10, I measured that my underlying application was performing much more quickly: it took advantage of the query parallelism improvements introduced in PostgreSQL 10. Getting these improvements took almost no effort on my part (in this case, I set the max_parallel_workers config parameter).
Having applications work better by simply upgrading is a delightful experience for users, and it's important that we keep our existing users happy as more and more people adopt PostgreSQL.
So, how can PostgreSQL 12 make your applications better just by upgrading? Read on!
Major Improvements to Indexing
Indexing is a crucial part of any database system: it facilitates the quick retrieval of information. The fundamental indexing system PostgreSQL uses is called a B-tree, which is a type of index that is optimized for storage systems.
It's very easy to take for granted the statement CREATE INDEX ON some_table (some_column); as PostgreSQL does a lot of work to keep the index up-to-date as the values it stores are continuously inserted, updated, and deleted. Typically, it just seems to work.
However, a problem with PostgreSQL indexes is that they can bloat and take up extra space on disk, which can also lead to performance penalties when both retrieving and updating data. In this case by "bloat," I mean inefficiencies in how the index structure is maintained, which may or may not be related to garbage tuples that are removed by VACUUM (and a hat tip to Peter Geoghegan for this fact). Index bloat can be very noticeable on workloads where an index is modified heavily.
PostgreSQL 12 makes significant improvements to how B-tree indexes work, and from experiments using TPC-C like tests, showed a 40% reduction in space utilization on average. This not only reduces the amount of time spent maintaining B-tree indexes (i.e. writes), but also provide benefits to how quickly data can be retrieved, given indexes are overall a lot smaller.
Applications that make heavy updates to their tables, typically in the OLTP family ("online transaction processing") should see noticeable improvements to their disk utilization as well as query performance. And less disk utilization means your database has more room to grow before you need to upgrade your infrastructure.
Based on your upgrade strategy, you may need to rebuild your B-tree indexes to take advantage of these improvements (for example, pg_upgrade will not automatically rebuild your indexes). In prior versions of PostgreSQL, if you have large indexes on your tables, this could lead to a significant downtime event as an index rebuild would block any modifications to a table. But here is another place where PostgreSQL 12 shines: now in PostgreSQL you can rebuild your indexes concurrently with the REINDEX CONCURRENTLY command, so now you can rebuild your indexes without potential downtime!
There are also other parts of PostgreSQL's indexing infrastructure that received improvements in PostgreSQL 12. One of the things that falls into the "just works" category involves the write-ahead log, aka WAL. The write-ahead log serves an important function, as it records every transaction that occurs in PostgreSQL, which is fundamental features such as crash safety and replication, and used by applications for archival and point-in-time-recovery. The write-ahead log also means that additional information needs to be written to disk, which can have performance ramifications.
PostgreSQL 12 reduces the overhead of WAL records generated by the GiST, GIN, and SP-GiST indexes when an index is building. This has multiple noticeable benefits, including less space on disk required for these WAL records and faster replays of this data, such as during crash recovery or point-in-time-recovery. If you use any of these types of indexes in your applications (for instance, geospatial applications powered by PostGIS make heavy use of the GiST index type), this is yet another feature that will make a noticeable impact without you having to lift a finger.
Partitioning is Bigger, Better, Faster
PostgreSQL 10 introduced declarative partitioning. PostgreSQL 11 made it much easier to use. PostgreSQL 12 lets you really scale your partitions.
PostgreSQL 12 received significant performance improvements to the partitioning system, notably around how it can process tables that have thousands of partitions. For example, a query that only affects a few partitions on a table with thousands of them will perform significantly faster. In addition to seeing performance improvements on those types of queries, you should also see an improvement in INSERT speed on tables with many partitions as well.
Writing data with COPY, which is a great way to bulk load data (here's an example of JSON ingestion) to partitioned tables, also received a boost in PostgreSQL 12. Using COPY was already fast; PostgreSQL 12 has made it noticeably faster.
All of the above makes it possible to store even larger data sets in PostgreSQL while making it easier to retrieve the data and, even better, it should just work. Applications that tend to have a lot of partitions, e.g. ones that record time series data, should see noticeable performance improvements just with an upgrade.
And while it may not broadly fall under the "make better just by upgrading" category, PostgreSQL 12 allows you to create foreign keys that reference partitioned tables, eliminating a "gotcha" that you may have experienced with partitioning.
WITH Queries Get a Big Boost
When the inlined common table expression patch was committed (aka CTEs, aka WITH queries) I could not wait to write an article on how big a deal this was for PostgreSQL application developers. This is one of those features where you can see your applications get faster, well, if you make use of CTEs.
I've often found that developers that are new to SQL like to make use of CTEs: if you write them in a certain way, it can feel like you're writing an imperative program. I also enjoyed rewriting those queries to not use CTEs and demonstrate a performance gain. Alas, these days are now gone.
PostgreSQL 12 now allows a certain kind of CTE to be inlined, i.e. one that has no side-effects (a SELECT) that is used only once later in a query. If I had collected statistics on the number of queries using CTEs I would rewrite, the majority would fall into this group. This will help developers to write code that can feel more readable that is now performant as well.
What's better is that PostgreSQL 12 will optimize the execution of this SQL without you having to do any additional work. And while I may no longer have to optimize this type of query pattern, it's certainly better that PostgreSQL is continuing to improve its query optimizations.
Just-in-Time (JIT) Is Now a Default
For PostgreSQL 12 systems that support LLVM, just-in-time compilation, aka "JIT," is enabled by default. In additional to providing JIT support to some internal operations, queries that have expressions (e.g. "x + y", which is a simple expression) in select lists (e.g. what you write after "SELECT"), use aggregates, have expressions in WHERE clauses and others can utilize JIT for a performance boost.
As JIT is enabled by default in PostgreSQL 12, you can see performance boosts without doing anything, but I would recommend testing out your application on PostgreSQL 11, where JIT was introduced, to measure how your queries perform and see if you need to do any tuning.
What about the other new features in PostgreSQL 12?
There are a lot of new features in PostgreSQL 12 that I am really excited about, from the ability to introspect JSON data using the standard SQL/JSON path expressions, to a type of multifactor authentication available using the clientcert=verify-full setting, to generated columns, and many more. These are for a different blog post.
Much like my experience going to PostgreSQL 10, I believe PostgreSQL 12 provides a similar ability to improve your overall experience just by upgrading. Of course, your mileage may vary: as I did with my PostgreSQL 10 upgrade, test your application under similar production system conditions first before making the switch. Even if PostgreSQL 12 is "now more stable" as I suggested, you should always extensively test your applications before moving them into production.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read