Performance Tips for Postgres FDW
We have a lot of Postgres foreign data wrapper (FDW) users here at Crunchy. The postgres_fdw lets you connect to remote servers and in some cases can be an alternative for traditional ETL/ELT processes. As I see use of the Postgres foreign data wrapper expanding, I wanted to make some recommendations on how to approach performance. When you query from a single Postgres cluster, the database cleverly uses internal statistics to auto-tune performance. However, when querying from the Postgres FDW, the querying host does not have access to statistics, and is flying blind with query optimization.
So how do we help a remote Postgres make smart querying choices? Well the answer is to think like an application developer and help your queries plan ahead on how data will be gathered on the remote server. These solutions use some of my favorite Postgres power functions like CTEs, sub-queries, and materialized views.
Background on postgres_fdw queries
Below is an illustration of the querying/local database using Postgres FDW. This is easily optimizable with indexing.
We send a simple join query
The querying database effectively runs what it can locally
After it knows the filters, it sends the ids to the remote database
The remote database returns a cursor
(and 6 and 7) The querying database iterates over the cursor until it returns all rows necessary to fulfill the process
(and 9) The querying database then joins and returns the results
To start debugging performance on the remote database, I recommend you start with the normal tools like reviewing logs or using pg_stat_statements
on the remote host. Checkout our blog post for a how-to on pg_stat_statements
.
Complex and inefficient queries
As queries get more complex, the querying database loses the ability to send efficient queries to the remote database. For instance, look at this query:
SELECT
local_table.name,
COUNT(remote.l_id)
FROM remote_table
INNER JOIN local_table ON remote_table.l_id = local_table.id
WHERE local_table.a_column = 'some_value'
GROUP BY 1
ORDER BY 2 DESC
This will join a remote table to a local table, filter on the local table, and then join it to the remote table. However, if you watch the logs, this is the query that is sent to the remote database:
SELECT * FROM remote_table; -- query sent to remote database
Even with an index on the remote_table.id
field, the query to the remote database requests all records. And, because it is requesting a number of queries, then you will see a number of fetches like below:
FETCH 100 FROM c1;
FETCH 100 FROM c1;
FETCH 100 FROM c1;
FETCH 100 FROM c1;
...(and it continues as many rows as required to fulfill the request)
But wait, what if we query from the local_table
in the FROM
clause, and move the remote_table
to a join? Nothing changes. The same query is sent to the remote database. To get around the large select all fetches, I’m going to dig more into joins, CTEs, subqueries, and more.
Local joins, remote joins, and cross-server joins
Knowing the location of the table is critical for optimizing your queries. It comes down to the following three scenarios:
Local joins: These are joins that occur completely on the local/querying server. These work as expected.
Remote joins: A remote join is any join that happens between tables on a foreign server. Joins for tables located on the same foreign server are executed on the foreign server and any sort optimization that can be inferred is run there as well. Additionally, as of Postgres 11, any foreign servers can execute aggregates locally as well.
Cross-server joins: Cross-server joins are joins on data across multiple servers. These are the least efficient queries with FDW because optimization would require information that the local server does not have access to. Because local joins are fast, and remote joins optimized, the typical performance problems with Postgres FDW come down to cross-server joins.
CTEs for your Postgres FDW queries
The easiest way to predict the query sent to the remote database is to containerize it in a Common Table Expression (CTE):
WITH filtered_remote_table AS (
SELECT * FROM remote_table WHERE created_at > now() - '1 day'::interval
)
SELECT
*
FROM local_table
LEFT JOIN filtered_remote_table ON local_table.id = filtered_remote_table.local_table_id
The above query is de-optimized for a single cluster, but optimized for a remote cluster. By containerizing the Postgres FDW query into the CTE, the matching rows are reduced to those needed on the remote host, then returned to the querying host. By doing this, we limit the returned row count.
If we did not use the CTE, then the LEFT JOIN would be larger and more unfiltered like I showed above.
Using Subqueries for your Postgres FDW
The next step in your journey to make queries more efficient for a remote host would be to try using a subquery to filter a Postgres FDW query. However, there’s a trick to it. If you nest a subquery into an IN
statement, all rows will be returned from the remote database. If we evaluate the subquery using and use ANY
, we can get Postgres to send an optimized query like this:
WITH filtered_local_table AS (
SELECT
local_table.id,
local_table.name
FROM local_table
WHERE local_table.a_column = 'some_value'
), filtered_remote_table AS (
SELECT
remote_table.l_id,
remote_table.created_at
FROM remote_table
WHERE remote_table.l_id = ANY ((SELECT ARRAY_AGG(DISTINCT filtered_local_table.id) FROM filtered_local_table)::integer[])
)
SELECT
filtered_local_table.name,
COUNT(DISTINCT filtered_remote_table.created_at)
FROM filtered_local_table
INNER JOIN filtered_remote_table ON filtered_local_table.id = filtered_remote_table.l_id
GROUP BY 1
ORDER BY 2 DESC;
When running this query on the querying host, the query sent to the remote host looks like this:
SELECT l_id, created_at FROM public.remote_table
WHERE ((l_id = ANY ($1::integer[])))
So what did we just run?
We filtered the
local_table
to a limited set of records.We filtered the
remote_table
to a limited set of records using theANY
operator. By using theARRAY_AGG
operator and casting it as anARRAY
of integers, we cause the query running to evaluate the values before sending the query.Then, we join the two CTEs together in a way that we would have done in the beginning.
The key to evaluating the request is:
(SELECT ARRAY_AGG(DISTINCT filtered_local_table.id)
FROM filtered_local_table)::integer[]
For a full review of the ANY
operator, checkout Postgres Query Boost: Using ANY Instead of IN.
Increasing FETCH Count
Postgres FDW uses a FETCH
command to fetch the next rows from the cursor. By default, the fetch is 100 rows, which is probably wrong for nearly every use case. What is the appropriate size for FETCH
? That depends on the use case.
Below is what lots look like when running a high number of fetches:
The FETCH
size can be increased on the fly for a remote server. To do so, run the following:
-- if the fetch_size has never been set:
ALTER SERVER remote_server OPTIONS (fetch_size '10000');
-- if the fetch_size has been set, use:
ALTER SERVER remote_server OPTIONS (SET fetch_size '10000');
After increasing the number of lines returned by FETCH
, you may continue to see multiple commands in the log, but there will be fewer of them. If you continue to see performance issues due to FETCH
, it may be time to filter the request more, or cache the values locally.
Caching locally
At larger data sizes, most FDW scenarios gravitate toward some form of caching locally. But, like most things Postgres, we have more than one option for caching. Below are some simple examples, but if I was mostly running a few transformations, depending on the use case, I would probably store the transformed data in cache instead of the raw data.
Materialized Views
Materialized views are the easiest way to cache locally. Run something like the following:
CREATE MATERIALIZED VIEW cached_remote_table AS (
SELECT * FROM remote_table
WHERE created_at > now() - '7 day'::interval
)
With materialized views, the results of the query are stored locally, and we can create indexes on the materialized views. However, the data will be as it was at the last query time, and we’ll need to run the following as often as we need updated data:
REFRESH MATERIALIZED VIEW cached_remote_table;
Cache Table
In Postgres 15, we gained the ability to run upserts using the MERGE
capabilities. When using MERGE
, we would create a local table, then regularly replicate the values from the remote table into the local cache table. The advantage of a cached table is that we can incrementally update large datasets.
Because of this, we can create a native SQL ETL tool like the following:
MERGE INTO cached_remote_table
USING (
WITH last_sync AS ( -- 1 - described below
SELECT
max(GREATEST(updated_at, created_at)) AS last_at
FROM cached_remote_table
)
SELECT -- 2 - described below
received_at, `
customer_id,
action
FROM remote_table
WHERE remote_table.received_at >= COALESCE((SELECT last_at FROM last_sync), '0001-01-01'::timestamp)
OR remote_table.updated_at >= COALESCE((SELECT last_at FROM last_sync), '0001-01-01'::timestamp)
) AS remote_table_to_sync
ON cached_remote_table.id = remote_table_to_sync.id
WHEN NOT MATCHED THEN -- 3 - described below
INSERT VALUES (remote_table_to_sync.a_column, remote_table_to_sync.created_at, remote_table_to_sync.updated_at)
WHEN MATCHED THEN -- 4 - described below
UPDATE SET a_column = remote_table_to_sync.a_column, updated_at = remote_table_to_sync.updated_at
;
So here we 1) find the latest sync, 2) find all values from the remote table that were created or updated since last since, 3) if the row does not exist, insert, 4) if the row exists, update the record.
For FDW performance, think like an application developer
To think like an application developer, means that you think about the query being sent to the database. You can’t just let Postgres make the decisions for you, you need to take a hands on approach. The more information you can give to your remote server at the beginning, the more efficient your queries are going to be. Working with postgres_fdw think about:
- Use CTEs to send queries that have more information to limit full scans on the remote host
- Use sub-queries for that same reason
- If your foreign server has a lot of data, try to cache data if you can
- Materialized views is great for creating a local copy of data
- A cache table with merge can be a way to create your own internal mini-ETL
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read