Logical replication from Postgres to Iceberg

Operational and analytical workloads have historically been handled by separate database systems, though they are starting to converge. We built Crunchy Data Warehouse to put PostgreSQL at the frontier of analytics systems, using modern technologies like Iceberg and a hybrid query engine.

Combining operational and analytical capabilities is extremely useful, but it is not meant to drive all your workloads into a single system. In most organizations, application developers and analysts work in different teams with different requirements on data modeling, resource management, operational practices, and various other aspects.

What will always be needed is a way to bring data and the stream of changes from an operational database into a separate analytics system. As it turns out, if both sides are PostgreSQL, magical things can happen…

Today, we are announcing the availability of native logical replication from Postgres tables in any Postgres server to Iceberg tables managed by Crunchy Data Warehouse.

The latest release of Crunchy Data Warehouse includes full support for:

Insert, update, delete, and truncate replication into Iceberg
High transaction rates
Low < 60 second apply lag
Preservation of transaction boundaries–foreign key constraints still hold
Automatic table creation and data copy
Automatic compaction
Advanced replication protocol features like row filters, streaming (v4 protocol), and failover slots.
Automatic handling of TOAST columns
Ability to rebuild tables while old data remains readable

While it sounds like something from the future, logical replication to Iceberg is available right now on Crunchy Bridge, and will be available for self-managed users in the next release of Crunchy Postgres for Kubernetes.

Setting up logical replication into Iceberg

Getting started with logical replication to Iceberg is very simple. You can literally set up everything with just 2 commands.

On the source:

create publication pub for table chats, users;

On Crunchy Data Warehouse, after ensuring connectivity to the source:

create subscription sub connection '...' publication pub with (create_tables_using = 'iceberg');

The create subscription command will create Iceberg tables for all tables in the publication, then copy the initial data in the background, and then replicate changes. You can also set up the Iceberg tables manually before creating the subscription.

You can run high performance analytical queries and data transformations directly on the Iceberg tables in Crunchy Data Warehouse once the initial data copy completes, or use other query engines with the SQL/JDBC Iceberg catalog driver.

How Postgres-to-Iceberg replication works

Conventional tools for applying a stream of changes to a data warehouse take large batches and apply them using merge commands. While effective, the computational cost of running these commands is relatively high, and increases significantly as the table grows.

We invented several new techniques to apply insertions and deletions to Iceberg in micro batches by taking advantage of Postgres’ transactional capabilities. Queries use an efficient merge-on-read method to apply deletions. Insertion and deletion files are later merged during automatic compaction, and compaction only accesses files that were (significantly) modified.

What that means is that replication can be sustained with relatively low lag and low overhead. The main cost is that the replication requires some disk space, though usually much less than the source data.

Get started with replication to your Postgres Data Warehouse

Our goal is to bring all PostgreSQL features and extensions to Iceberg with high performance analytics. Logical replication is a useful Postgres feature that becomes essential in the context of a data warehouse, given the need to synchronize data from operational databases.

Of course, PostgreSQL isn’t perfect. Where possible we try to go the extra mile to build a seamless experience, for instance by enabling automatic Iceberg table creation in CREATE SUBSCRIPTION. There are many other ways in which we think the logical replication experience can be improved, especially for Iceberg, so this is the start of a journey.

If you want to get started with this seamless Postgres -> Iceberg replication experience we encourage you to reach out to us or check out the documentation.

Latest Articles

Logical replication from Postgres to Iceberg

Setting up logical replication into Iceberg

How Postgres-to-Iceberg replication works

Get started with replication to your Postgres Data Warehouse

Related Articles