Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
When applications store and retrieve data from databases, it is important to monitor which queries are performing well, and which are not. Because data is being added and changed, and applications evolve as needs change, this process should be frequent and constant.
pg_stat_statements
is a PostgreSQL extension that tracks query statistics. It is
extremely useful to find tuning opportunities in your database schema or queries.
To use it, you first need to enable it (no need to do so on the PostgreSQL window here)
shared_preload_libraries = 'pg_stat_statements'
in your postgresql.conf
configuration fileCREATE EXTENSION pg_stat_statements;
pg_stat_statements
carries a lot of information. If you're running these in psql
,
PostgreSQL's standard command-line client, first turn on extended display of results.
\x on
Simplest query you can run, with a lot of information, from every database where the extension has been enabled:
SELECT * FROM pg_stat_statements;
When you're trying to identify bad-performing queries, you're often looking for queries
that are run against a single database, so let's join against pg_database
so we can
filter. The following query gets only a few columns, shows the database name,
the top 5 queries with highest execution time, for every database:
SELECT
d.datname, s.total_exec_time, s.calls, s.rows,
s.query
FROM pg_stat_statements s JOIN pg_database d ON (s.dbid = d.oid)
ORDER BY total_exec_time DESC
LIMIT 5;
What if we need to find queries that are using a lot of time of the database?
The total_exec_time
column tells us the total time spent executing the statement,
in milliseconds, which is very precise. Let's round that to 2 decimal places.
In this query we ask for the total execution time, the number of calls and rows, calculate the average time and the "percentage cpu" usage for each query.
SELECT
d.datname, round(s.total_exec_time::numeric, 2) AS total_exec_time, s.calls, s.rows,
round(s.total_exec_time::numeric / calls, 2) AS avg_time,
round((100 * s.total_exec_time / sum(s.total_exec_time::numeric) OVER ())::numeric, 2) AS percentage_cpu,
substring(s.query, 1, 50) AS short_query
FROM pg_stat_statements s JOIN pg_database d ON (s.dbid = d.oid)
ORDER BY percentage_cpu DESC
LIMIT 5;
With a list of the worst-performing queries in hand, we can dig deeper into each of them by running them individually through EXPLAIN ANALYZE
. That will show us the execution plan that Postgres executed for the query, so we analyze where the query could use improvements.
What's the average execution time of ALL queries in ALL databases?
SELECT (sum(total_exec_time) / sum(calls))::numeric(6,3) AS avg_execution_time
FROM pg_stat_statements;
Let's find the top 5 queries that write the most to shared_buffers
, the amount of memory
PostgreSQL uses for shared memory. That means these queries invalidate a lot of shared memory blocks, and could perhaps use some improvements.
SELECT query, shared_blks_dirtied
FROM pg_stat_statements
WHERE shared_blks_dirtied > 0
ORDER BY 2 desc
LIMIT 5;
pg_stat_statements
can tell us about queries that are getting executed many times, and taking a long time, among other things.
If we combine that information with statistics from other tables, such as pg_stat_user_tables
and pg_stat_user_indexes
we can also find more information to help us improve our performance.
Often bad-performing queries happen because they're missing an index to help the database find the right rows to return. This query will tell us which tables might be missing an index
SELECT relname, seq_scan - idx_scan AS too_much_seq,
CASE WHEN seq_scan - idx_scan > 0 THEN 'Missing Index?' ELSE 'OK' END,
pg_relation_size(relid) AS rel_size, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE schemaname <> 'information_schema' AND schemaname NOT LIKE 'pg%'
ORDER BY too_much_seq DESC;
relname | too_much_seq | case | rel_size | seq_scan | idx_scan
-------------+--------------+----------------+----------+----------+----------
dependents | 41 | Missing Index? | 139264 | 41 | 0
countries | -3 | OK | 8192 | 1 | 4
locations | -3 | OK | 8192 | 1 | 4
regions | -24 | OK | 8192 | 1 | 25
departments | -927 | OK | 8192 | 81 | 1008
employees | -3469 | OK | 106496 | 161 | 3630
The dependents table has had 41 more sequential scans than index scans, and may need an index, based on the queries that have been recently executed.
It is important to understand that indexes do have a cost, and slow down writes to a table because the database has to update the index whenever data is added or changed. So one should NOT just add indexes to all columns of a table.
pg_stat_statements
is a powerful way to find way to find useful information about
queries and their effect on database performance. Combined with EXPLAIN
and data from another view,pg_stat_activity
, which we'll cover in another tutorial, you can find important information to help you better manage your applications.
Read some more on this blog post. Full documentation is at https://www.postgresql.org/docs/current/pgstatstatements.html.
Loading terminal...
Loading terminal...