A Look at Postgres 15: MERGE Command with Examples
Jean-Paul Argudo
4 min readMore by this author
With PostgreSQL 15 comes a new SQL Command called MERGE. MERGE
has been in the SQL standard for quite a while, but it just made its way into the PostgreSQL codebase. Prior to MERGE
, if you wanted to refresh a target table from a source table, prior to Postgres 15, you could use the "upsert" method with the ON CONFLICT
clause.
Now, MERGE
can be used instead! Some situations where MERGE
makes a lot of sense are:
- data loading from external sources, thru foreign data wrappers
- staged and batched process jobs
About MERGE
Let's look at the synopsis in the documentation:
[[ WITH with_query [, ...] ]
MERGE INTO target_table_name [ [ AS ] target_alias ]
USING data_source ON join_condition
when_clause [...]
where data_source is:
{ source_table_name | ( source_query ) } [ [ AS ] source_alias ]
and when_clause is:
{ WHEN MATCHED [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING } }
and merge_insert is:
INSERT [( column_name [, ...] )]
[ OVERRIDING { SYSTEM | USER } VALUE ]
{ VALUES ( { expression | DEFAULT } [, ...] ) | DEFAULT VALUES }
and merge_update is:
UPDATE SET { column_name = { expression | DEFAULT } |
( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) } [, ...]
and merge_delete is:
DELETE
We can read that the source of the data_source
can be either a table or some data brought to the command thanks to the prior WITH
with_query
, or inside the MERGE
after USING
source_query
. When the data matches, i.e. there's a matching record in the target_table_name
, we can specify what to do:
- do an
UPDATE
as defined in themerge_update
(seeUPDATE SET...
) part or - do a
DELETE
as defined in themerge_delete
or - do nothing
When the data doesn't matches, i.e. there's no matching record in the target_table_name
, then we do an INSERT
(see merge_update
).
MERGE
example - remote sensors & batches
In working with clients, I have seen a need for data loading with remote sensors or stations. Anything “sending regularly data” and loading that data into a dataset is a common need. I’m particularly excited about using MERGE
to solve these issues for Postgres users.
In order to explain this data loading use case of MERGE
, I’m going to set up an example. We’ll have a database of stations that’s a remote measurement tool with data coming in intermittently.
- Streaming data, data arriving continuously
- Stations emit periodic and intermittent data measures
- There's a batch collecting this data
station_data_new
is a temp table for common storage- Data is stored long term
station_data_actual
with the last possible values - We want to keep track of when the station has been
created
Let's create some tables for testing purpose. Beware that the temporary table station_data_new
will only exist in the context of a given session.
create temporary table station_data_new (
station_id integer
, a integer
, b integer
);
create table station_data_actual (
station_id integer primary key
, a integer
, b integer
, created timestamp default current_timestamp
, updated timestamp default current_timestamp
);
Let's create some sample data into station_data_new
: our 1st 5 stations are up and sent the 1st batch of data:
with measures as (
select *
from generate_series(1,5)
)
insert into station_data_new (
station_id
, a
, b
)
select
generate_series
, round(random()*100+1)
, round(random()*100+1)
from
measures;
Basic MERGE
At this point, we could just do a plain INSERT
with SELECT
from station_data_new
to station_data_actual
. Instead we’ll use MERGE
because we’re planning for cases when the data already exists, an UPDATE
will be issued and not an INSERT
.
merge into station_data_actual sda
using station_data_new sdn
on sda.station_id = sdn.station_id
when matched then
update set a = sdn.a, b = sdn.b, updated = default
when not matched then
insert (station_id, a, b)
values (sdn.station_id, sdn.a, sdn.b);
If you execute it once, you'll have in return MERGE 5
, and the data is inserted into station_data_actual
, where the timestamps in created
and updated
have the same value.
To do even more testing you could:
truncate table station_data_new;
- Re-create data in it, by changing
generate_series(1,5)
withgenerate_series(1,10)
- Issue again the very same
MERGE
as we ran before
You'll see the data in station_data_actual
updated. Stations 1 to 5 will have updated
status and stations 6 to 10 will be created
and updated
.
Conclusion
MERGE
was a long awaited feature for PostgreSQL fans! Now we have it. I encourage you to review your processes when it's about merging existing data in your database. In this example, new data entering with existing data can be a good place to use this.
MERGE
opens new usages of aggregating and/or merging data from many databases. In distributed models, where all data exists across different locations, using foreign data wrappers with MERGE
could be a really elegant solution. MERGE
may simplify your processing of data all along the database's life. With strong and fast SQL statements as opposed to functions or other complex operations.