Loading the World! OpenStreetMap Import In Under 4 Hours
The OpenStreetMap (OSM) database builds almost 750GB of location data from a single file download. OSM notoriously takes a full day to run. A fresh open street map load involves both a massive write process and large index builds. It is a great performance stress-test bulk load for any Postgres system. I use it to stress the latest PostgreSQL versions and state-of-the-art hardware. The stress test validates new tuning tricks and identifies performance regressions.
Two years ago, I presented (video / slides) at PostGIS Day on challenges of this workload. In honor of this week’s PostGIS Day 2024, I’ve run the same benchmark on Postgres 17 and the very latest hardware. The findings:
- PostgreSQL keeps getting better! Core improvements sped up index building in particular.
- The osm2pgsql loader got better too! New takes on indexing speed things up.
- Hardware keeps getting better! It has been two years since my last report and the state-of-the-art has advanced.
Tune Your Instrument
First, we are using bare metal hardware—a server with 128GB RAM—so so let’s tune Postgres for loading and to match that server:
max_wal_size = 256GB
shared_buffers = 48GB
effective_cache_size = 64GB
maintenance_work_mem = 20GB
work_mem = 1GB
Second, let’s prioritize bulk load. The following settings do not make sense for a live system under read/write load, but they will improve performance for this bulk load scenario:
checkpoint_timeout = 60min
synchronous_commit = off
# if you don't have replication:
wal_level = minimal
max_wal_senders = 0
# if you believe my testing these make things
# faster too
fsync = off
autovacuum = off
full_page_writes = off
It’s also possible to tweak the background writer for the particular case of massive data ingestion, but for bulk loads without concurrency it doesn’t make a large difference.
How PostgreSQL has Improved
In 2022, testing that year's new AMD AM5 hardware loaded the data in just under 8 hours with Postgres 14. Today the amount of data in the OSM Planet files has grown another 14%. Testing with Postgres 17 still halves the load time, with the biggest drops coming from software improvements in the PG14-16 time-frame.
The benchmark orchestration and metrics framework here is my pgbench-tools. Full hardware details are published to GeekBench.
GIST Index Building in PostgreSQL 15
The biggest PostgreSQL speed gains are from improvements in the GIST index building code.
The new code pre-sorts index pages before merging them, and for large GIST index builds the performance speed-up can be substantial, as reported by the author of osm2pgsql.
My tests showed going from PostgreSQL 14 to 15 delivered:
- 16% speedup
- 15% size reduction
- 86% GIST index build speedup!
There have been further improvements in PostgreSQL 16 and 17 in B-Tree index building, but this osm2pgsql benchmark does not really show them. The GIST index time build times wash out the other index builds.
How osm2pgsql has improved
In Q3 2022, osm2pgsql 1.7 made a technique called the Middle Way Node Index ID Shift the new default.
Middle Way Node Index ID Shift is a clever design approach that compresses the database's largest index, trading off lookup and update performance for a smaller footprint. It uses a Partial Index to merge nearby values together into less fine grained sections. When an index is used frequently, this would waste too many CPU cycles. Similar to hash bucket collision, partial indexes have to constantly exclude non-matched items. That chews through extra CPU on every read. In addition, because individual blocks hold so many more values, the locking footprint for updates increases proportionately. However, for large but infrequently used indexes like this one, those are satisfactory trade-offs.
Applying that improvement dropped my loading times by 37% and plummeted the database size from 1000GB to under 650GB. Total time at the terabyte size had crept upward to near 10 hours. The speed-up drove it back below 6 hours.
The osm2pgsql manual shows the details in its Update for Expert Users. I highly recommend that section and its Improving the middle blog entry. It's a great study of how PG's skinnable indexing system lets applications optimize for their exact workload.
How hardware has improved
SSD Write Speed
During data import, the osm2pgsql workload writes heavily at medium queue depths for hours. The best results come from SSDs with oversized SLC caches that also balance cleanup compaction of that cache. The later CREATE TABLE AS (CTAS) sections of the build reach its peak read/write speeds.
I saw 11GB/s from a Crucial T705 PCIe 5.0 drive the week (foreshadowing!) I was running that with an Intel i9-14900K:
osm2pgsql has a tuning parameter named --number-processes
that guides how many parallel operations the code tries to spawn.
For the server and memory I used in this benchmark, increasing--number-processes
from my earlier 2 to 5 worked well. However, be careful: you can easily go too far! Bumping up this parameter increases memory usage too. Going wild on the concurrent work will run you out of memory and put you into the hands of the Linux Out of Memory (OOM) killer.
Processor advances
Obviously, every year processors get a little better, but they do so in different ways and at different rates.
For later 2023 and testing against PostgreSQL 15 and 16, an Intel i7-13600K overtook the earlier AMD R5 7700X. There was another small bump in 2024 upgrading to an i9-14900K.
But this is a demanding regression test workload, and it only took a few weeks of running the OSM workload to trigger the i9-14900K’s voltage bugs to the point where my damaged CPU could not even finish the test.
Thankfully I was able to step away from those issues when AMD's 9600X launched. Here's the latest results from PG17 on an AMD 9600X, with the same SK41 2TB drive as I tested in 2022 for my PostGIS Day talk.
My best OSM import results to date
2024-10-15 10:03:41 [00] Reading input files done in 7851s (2h 10m 51s).
2024-10-15 10:03:41 [00] Processed 9335778934 nodes in 490s (8m 10s) - 19053k/s
2024-10-15 10:03:41 [00] Processed 1044011263 ways in 4301s (1h 11m 41s) - 243k/s
2024-10-15 10:03:41 [00] Processed 12435485 relations in 3060s (51m 0s) - 4k/s
2024-10-15 10:03:41 [00] Overall memory usage: peak=158292MByte current=157746MByte...
2024-10-15 11:32:13 [00] osm2pgsql took 13162s (3h 39m 22s) overall. f
Completed in less than 4 hours!
PostgreSQL 17 is about 3% better on this benchmark than PostgreSQL 16 when replication is used, thanks to improvements in the WAL infrastructure in PostgreSQL 17.
I look forward to following up on this benchmark in more detail, after my scorched Intel system is fully running again! Like the speed of the Postgres ecosystem, the pile of hardware I've benchmarked to death grows every year.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read