Five Great Features of the PostgreSQL Partition Manager
After much testing and work the PostgreSQL Partition Manager, pg_partman, version 5 is now available for public release. Thanks to everyone involved for helping me get here!
My recent post discusses many of the big changes, so please see that post or the CHANGELOG for a full summary of version 5.
What I'd like to do today is take a step back and review five notable features that make pg_partman an important tool for managing large tables in PostgreSQL:
- Retention
- Background Worker
- Additional Constraint Exclusion
- Epoch Partitioning
- Template Table
Retention
One of the primary reasons to partition tables in PostgreSQL is to avoid the potentially very heavy overhead associated with deleting many rows in large batches. PostgreSQL's MVCC system means that when you delete rows, they're not actually gone until the VACUUM system comes through to mark them as reusable space for new rows. Even then that does not immediately (and may likely never) return the space to the file system. Vacuuming such large volumes of old rows can put a heavy strain on an already stressed system, which is what you may have been trying to alleviate by removing those rows.
Instead of running delete statements to remove your old data, partitioning let's you slice the table up and then simply drop the old partitions that have data you don't need in the database anymore. This avoids the need to vacuum to clean up old rows and, more importantly, immediately returns that space to the file system with very little overhead. Another thing to note here is that if retention is your reason for partitioning, you don't have to make your partition interval very small. It just has to be enough to cover your retention policy for how much data you need to keep. If your policy is 30 days, you can try setting the partition interval to monthly. That doesn't necessarily mean you'll only have one month of data around (Ex. February). But it does meet your retention policy of at least 30 days, so you might just be keeping 2 months of data around to meet that and the 3rd oldest table gets dropped. It's not exact, but it meets your needs and can vastly reduce the resource requirements of keeping years of data in a single table.
To configure this in partman, simply set the retention
column in the part_config
table to either an interval or integer value. For time-based partitioning, this is as easy as setting an expected interval value: 30 days, 3 weeks, 1 year, etc. If a given older partition contains ONLY data that is older than that interval, then it will be dropped.
UPDATE partman.part_config SET retention = '30 days' WHERE parent_table = 'public.testing_plans';
Numeric-based partitioning is not quite as straight forward but a simple example hopefully helps: the integer value will set that any partitions with a value less than the current maximum value minus the retention value will be dropped. For example, if the current max value is 100 and the retention value is 30, any partitions with values less than 70 will be dropped. The current maximum value at the time the drop function is run is always used.
UPDATE partman.part_config SET retention = '10000', retention_keep_table = false WHERE parent_table = 'public.testing_plans';
Some other features involving retention available in partman are that you can configure the old tables to either be detached (the default) or actually dropped, which the above example shows. The default is to detach (retention_keep_table = true
) to help avoid losing data accidentally. Many of partman's features that involve any sort of data removal or rejection typically default to a method that does not lose the data as a safety precaution. We'd like you to have to make extra effort to tell partman that yes, I actually want to ignore/remove data. You can also tell partman to move the old tables to a different schema as part of the retention maintenance option. And lastly, there is a Python script available to dump out any tables in a given schema to compressed, checksummed files using pg_dump. This, combined with the schema migration feature, gives you the option of storing old tables out-of-bounds of the database but still available in a much-reduced size to be restored if needed.
UPDATE partman.part_config SET retention = '10 days', retention_schema = 'plan_archives' WHERE parent_table = 'public.testing_plans';
python3 dump_partition.py --connection="host=localhost user=admin dbname=testingdb" --schema="plan_archives" --output="/opt/testing/archive_dumps/092023_testing_plans_archive.pgr.tar.gz" --dump_database="testingdb" --dump_host="localhost" --dump_username="admin"
Example output using the table schema given below in the Epoch Partitioning section given the above retention on Sept 27, 2023.
$ pwd
/opt/testing/archive_dumps
$ ls -l
total 48
-rw-rw-r-- 1 keith keith 174 Sep 27 11:04 plan_archives.testing_plans_p20230731.hash
-rw-rw-r-- 1 keith keith 1874 Sep 27 11:04 plan_archives.testing_plans_p20230731.pgdump
-rw-rw-r-- 1 keith keith 174 Sep 27 11:04 plan_archives.testing_plans_p20230807.hash
-rw-rw-r-- 1 keith keith 1874 Sep 27 11:04 plan_archives.testing_plans_p20230807.pgdump
-rw-rw-r-- 1 keith keith 174 Sep 27 11:04 plan_archives.testing_plans_p20230814.hash
-rw-rw-r-- 1 keith keith 1874 Sep 27 11:04 plan_archives.testing_plans_p20230814.pgdump
-rw-rw-r-- 1 keith keith 174 Sep 27 11:04 plan_archives.testing_plans_p20230821.hash
-rw-rw-r-- 1 keith keith 1874 Sep 27 11:04 plan_archives.testing_plans_p20230821.pgdump
-rw-rw-r-- 1 keith keith 174 Sep 27 11:04 plan_archives.testing_plans_p20230828.hash
-rw-rw-r-- 1 keith keith 1874 Sep 27 11:04 plan_archives.testing_plans_p20230828.pgdump
-rw-rw-r-- 1 keith keith 174 Sep 27 11:04 plan_archives.testing_plans_p20230904.hash
-rw-rw-r-- 1 keith keith 1874 Sep 27 11:04 plan_archives.testing_plans_p20230904.pgdump
Background Worker
Partitioning schemes based on time or numbers generally are not a once-and-done set up. They need continual work to create new partitions for new data and possibly remove old data (see Retention). For many, this means using some sort of external scheduler like cron to periodically call some function to perform maintenance and add/remove child tables. pg_partman takes advantage of a feature in PostgreSQL called a background worker. These have been around since PostgreSQL 9.3, but many people are still unaware of them. Anyone can write a background worker for PostgreSQL and it will run automatically when the database runs and basically do whatever you tell it to do. In the case of pg_partman, this can be the maintenance process to manage the partition sets.
Enabling and configuring this is very easy. Install pg_partman however your environment requires (package, manual build, etc) and enable its shared library in the postgresql.conf
shared_preload_libraries = 'pg_partman_bgw'
You'll also need to tell it which databases pg_partman is installed and how often you'd like it to run. The default is once per hour (3600 seconds), but can be any valid interval value. You'll want to make sure it's running at least often enough for your smallest partition set's interval, usually at least twice within that interval. So if you're doing daily partitioning at the smallest, 12 hours is usually a good value. If no partitions need to be made when it's called, it won't do anything, but you don't want it needlessly using up max_worker_processes.
pg_partman_bgw.dbname = 'alphadb, betadb'
pg_partman_bgw.interval = '12 hours'
See the documentation for other BGW settings that are available.
Note that the time the actual BGW process runs can drift due to the nature of calling a process on an interval vs at a specific time. If you need partition maintenance to run at specific times, then you will need some sort of scheduler instead of partman's BGW. There is another helpful extension that also makes use of BGWs called pg_cron that can help you with this. Also, you can run the maintenance for all partition sets at once:
SELECT partman.run_maintenance();
Or you can run maintenance for specific partition sets. Note that if you also call the above command but don't want these specific partition sets to run when that is done, you'll need to set the automatic_maintenance
flag for those partition sets to false.
UPDATE partman.part_config SET automatic_maintenance = false WHERE parent_table = 'public.testing_plans';
Then set your schedule to call this set specifically
SELECT partman.run_maintenance('public.testing_plans');
Additional Constraint Exclusion
One of the first advanced features I added to pg_partman was the ability to create additional constraints on columns other than the partition control column. Why would you do this? Well, for one of the same reasons you use partitioning in the first place in PostgreSQL: constraint exclusion.
What if you could tell PostgreSQL to ignore all the tables in the partition set that don't have any data that you want as part of your query conditions? Well, that's what partitioning allows PostgreSQL to automatically do. Now, in the case of the partition column itself it uses something more advanced called partition pruning (PG 12+). Normally, a constraint exclusion has to at least visit each child table to see if the constraint covers the requested data range. Partition pruning is even smarter than that and uses the partition definitions instead, but we're not going to get much deeper than that for now. Even so, simply evaluating a constraint could be tremendously more efficient than scanning an index or performing a filter. So how could we take advantage of that on columns other than the partition key?
pg_partman's feature uses the constraint_cols
and optimize_constraint
columns from the part_config
table to set this up. constraint_cols
is an array list of one or more columns that you want to consider for constraint exclusion. optimize_constraint
is a count of how many partitions back from the "current" one to go to set things up for constraint exclusion using the given columns. So let's say we're partitioning monthly and after 3 months, that data is never changed anymore on the evaluation_date
column in our public.testing_plans
table. We'd update the part_config
table like so
UPDATE partman.part_config SET constraint_cols = ARRAY['evaluation_date'], optimize_constraint = '3' WHERE parent_table = 'public.testing_plans';
What this tells partman to do is to go back and look at the tables before the 3rd oldest one relative to the current and create a constraint based on the existing minimum and maximum values in those given columns, no matter the data type as long as that is a valid aggregation. Now, if you run a query with conditionals on these columns, PostgreSQL may be able to take advantage of constraint exclusion for all your tables older than 3 months and exclude them from query plans. Note this does mean that this data most likely can never change unless the new value keeps it within that constraint. So again, this is a feature mainly for older, static data. However, this could potentially have huge query planning benefits depending on your data patterns.
Epoch Partitioning
Many tools and applications out there need a simpler value to store and evaluate time. What could be simpler than an integer? But what integer? Enter the epoch into computer science. Most are familiar with the UNIX/POSIX epoch of the number of seconds since January 1, 1970. So how can we partition on this value but still consider it a time in all but its literal value?
pg_partman can do this with a simple flag when creating your partition set:
SELECT partman.create_parent(
'public.testing_plans'
, 'created_at'
, '1 week'
, p_epoch := 'seconds'
, p_start_partition := to_char(date_trunc('week',CURRENT_TIMESTAMP)-'8 weeks'::interval, 'YYYY-MM-DD HH24:MI:SS')
);
This tells pg_partman that the column itself is allowed to be any integer type, but for all maintenance purposes it is considered a time value. Normally the epoch is the number of seconds, but I've had requests for millisecond and nanosecond support over the years, so that is why the flag column isn't just a simple boolean and can be one of these 3 values. We've also used another handy feature here in partman to tell it that we actually want the "first" partition in this set to be eight weeks in the past from now. So, as of the time of this writing the above would configure a partition set that looks like this. Note that I did run this on a Monday, so the first day of the week here is a Monday. I'll be doing a followup blog post on how to ensure which specific day a given partition interval starts on for things like weekly, quarterly, or any other interval that needs to be told to start on a specific timeframe.
\d+ public.testing_plans
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------------+--------------------------+-----------+----------+--------------------------------------------------------------------------+----------+-------------+--------------+-------------
testid | integer | | | | plain | | |
plan | text | | | | extended | | |
created_at | integer | | not null | EXTRACT(epoch FROM date_trunc('week'::text, CURRENT_TIMESTAMP))::integer | plain | | |
evaluation_date | timestamp with time zone | | | | plain | | |
evaluation_result | boolean | | | | plain | | |
Partition key: RANGE (created_at)
Partitions: testing_plans_p20230731 FOR VALUES FROM (1690776000) TO (1691380800),
testing_plans_p20230807 FOR VALUES FROM (1691380800) TO (1691985600),
testing_plans_p20230814 FOR VALUES FROM (1691985600) TO (1692590400),
testing_plans_p20230821 FOR VALUES FROM (1692590400) TO (1693195200),
testing_plans_p20230828 FOR VALUES FROM (1693195200) TO (1693800000),
testing_plans_p20230904 FOR VALUES FROM (1693800000) TO (1694404800),
testing_plans_p20230911 FOR VALUES FROM (1694404800) TO (1695009600),
testing_plans_p20230918 FOR VALUES FROM (1695009600) TO (1695614400),
testing_plans_p20230925 FOR VALUES FROM (1695614400) TO (1696219200),
testing_plans_p20231002 FOR VALUES FROM (1696219200) TO (1696824000),
testing_plans_p20231009 FOR VALUES FROM (1696824000) TO (1697428800),
testing_plans_p20231016 FOR VALUES FROM (1697428800) TO (1698033600),
testing_plans_p20231023 FOR VALUES FROM (1698033600) TO (1698638400),
testing_plans_default DEFAULT
Template Table
The final feature I'll be discussing is actually a feature I hope will eventually disappear! Early on in PG10 and 11, there were many features that had been possible with the old, trigger-based partitioning that hadn't made it to the built-in declarative system (Ex. index and foreign key inheritance). At that time, I implemented a template table system to provide a way to have these features with declarative partitioning: apply the feature to a separate template table and pg_partman would handle applying that feature to the child tables. Since then, many features have been built into declarative partitioning and I've migrated them away from something that is taken from the template table. However, some features still remain on the template table:
- Unique indexes that do not include the partition key
- UNLOGGED tables
- Table properties set with ``ALTER TABLE ... SET``` commands (autovacuum settings, storage properties, etc)
And there are some additional ones that are currently in open issues that I'll be investigating for future versions. However, as I said, I hope these issues are addressed by the core team so that hopefully this feature is no longer required in pg_partman. And who knows, maybe core could potentially make this entire extension obsolete some day and I'd be 100% fine with that!
Conclusion
Hopefully this overview of features in pg_partman shows you the power that the extension system can bring to PostgreSQL. The core database system can sometimes be a slow moving beast and there's nothing wrong with that since that keeps it the stable and reliable database that PostgreSQL is known to be. But thankfully users like us can contribute by making new features available a little sooner and help drive that core development forward!
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read