Top 10 Postgres Management Tasks
1. Add a statement timeout
Postgres databases are very compliant, they do what you tell them until you tell them to stop. It is really common for a runaway process, query, or even something a co-worker runs to accidentally start a never ending transaction in your database. This potentially uses up memory, i/o, or other resources.
Postgres has no preset default for this. To find out your current setting:
SHOW statement_timeout;
A good rule of thumb can be a minute or a couple minutes.
ALTER DATABASE mydatabase
SET statement_timeout = '60s';
This is a connection-specific setting, so you’ll need to reconnect and have your application reconnect, for this to take effect on an ongoing basis.
2. Confirm you have enough memory
For application workloads you want your most frequently accessed Postgres data to be accessible in memory/cache. You can check your cache hit ratio to see how often Postgres is using the cache. Ideally, you have 98-99% of data in the cache. If you see your cache hit ratio below that, you probably need to look at your memory configuration or move to an instance with larger memory.
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM
pg_statio_user_tables;
Note: For warehouse or analytical workloads, you will probably have a much lower cache hit ratio.
3. Check shared buffers
Shared buffers is another key memory check. The default shared_buffers is 128MB. Find your current setting with:
SHOW shared_buffers;
The value should be set to 15% to 25% of the machine’s total RAM. So if you have an 8GB machine, a quarter of that would be 2GB.
SET shared_buffers='2GB';
Shared buffers is a parameter that requires a restart to take effect.
4. Use SSL/TLS for data in transit
To find out if you’re currently using ssl:
SHOW ssl;
Hopefully you’ll see ssl | on
.
If you’re not, you’ll need to do some work on the database and application servers to make sure connections are encrypted. See more docs here.
5. Set up backups
Backups are a must have in database management. There’s a few ways to get backup data from Postgres but here’s the essential info:
- pg_dump generates backup files but it shouldn’t be used as a real backup, it is more of a data manipulation tool
- pg_basebackup generates a full binary copy of the database including WAL files, but by itself it is not a complete backup system
- pgBackRest is a complete WAL archive and backup tool which can be used for disaster recovery and point-in-time recovery
You should be using a full disaster recovery data backup tool or working with a vendor that does it for you.
6. Stay on top of Postgres releases and upgrade frequently
The PostgreSQL development community releases about 4 minor versions a year and 1 major version a year.
You should be planning to patch your database in some alignment to this schedule. Staying on top of security patches and the most recent versions will make sure you’re running on the most up to date and most efficient software. Here’s a graphic of where we are now and what is coming later this year. Make sure you have plans to upgrade frequency and to major versions annually.
7. Use pg_stat_statements
pg_stat_statements has to be the most valuable Postgres tool that’s not part of the out of the box software. I mentioned to some committers at a conference recently that we should get it in core Postgres and they assured me I could have a patch in and rejected before the day was over. To be fair it is a contrib module that generally ships with Postgres so you don’t have to go searching for it.
Since pg_stat_statements comes with the Postgres contrib libraries, its really easy to add with CREATE EXTENSION pg_stat_statements
. You also have to add it to shared preloaded libraries since it shares some memory. Adding it also requires a restart.
Here’s a quick query for checking on your 10 slowest queries. Always a good idea to peek in on these and see if there’s any easy fixes to make things work a little faster.
SELECT
(total_exec_time / 1000 / 60) as total_min,
mean_exec_time as avg_ms,
calls,
query
FROM pg_stat_statements
ORDER BY 1 DESC
LIMIT 10;
8. Add indexes
Indexes are really the foundational key to query performance for Postgres. Without indexes, your database is doing full sequential scans each time you query data which uses up a lot of memory and precious query time. Adding indexes gives Postgres an easy way to find and sort your data. Using that handy pg_stat_statements above, you already know what queries are the slowest.
The pg_indexes view will show you what you’ve got at the moment:
SELECT * FROM pg_indexes;
Check out Postgres Indexes for Newbies if you’re just getting started.
9. Check for unused indexes
Indexes are incredibly helpful but sometimes folks go too far adding indexes for everything. Indexes can take up a fair amount of storage space, and all new writes have to be written to them, so keeping them around if they’re not being used can be bad for performance. The pg_stat_user_indexes table has all the information for you on this, so you can look at index usage with a select * from pg_stat_user_indexes
. A more sophisticated query that removes unique indexes and primary keys, showing you unused indexes ordered by size is.
SELECT schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS "index size",
idx_scan as "index scans"
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique
AND idx_scan < 50
AND pg_relation_size(relid) > 5 * 8192
ORDER BY
pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST,
pg_relation_size(i.indexrelid) DESC;
If you’re using read replicas, don’t forget to check those too before you delete unused indexes. An unused index on the primary might be used on the replica.
10. Review your connection settings
Postgres has a max_connections setting that defaults at 100. This will show you how many connections your instance is currently configured for:
SHOW max_connections;
For tuning the max_connections setting in Postgres, you’ll need to know how your application is connecting and how many connections are allowed. You’ll also want to leave a little headroom, like 10% for other processes, or people, to connect to the database as well. For example if you have 4 servers that can use 50 connections each, plus 10%, you’d want to set max connections to 220.
You may also want to look at a connection pooler. You can check for idle and active connections in your database with the below query.
SELECT count(*), state
FROM pg_stat_activity
GROUP BY 2;
If you're in the high 10s or if you have more idle than active connections, pooling might be a good option.
Need more Postgres tips?
We have an awesome tips page we’ve been building out. We also just started a new Discord channel to chat about Postgres, stop by, say hi, and let me know what your Top 10 list for Postgres is.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read