Querying Spatial Data with PostGIS and ogr_fdw
In my last post, I did a simple intro to foreign data wrappers in PostgreSQL. postgres_fdw is an extension available in Postgres core that allows you to issue queries against another Postgres database. It's just one of many foreign data wrappers that you can use in Postgres, so for today's post we'll look at another that works especially well with spatial data formats: ogr_fdw.
I had also previously talked about some different ways to get spatial data into a Postgres/PostGIS database, but the cool thing about foreign data wrappers is that they're an alternative to needing to have everything in the same data store. With spatial data being stored and shared in so many different formats, imagine being able to abstract that conversion away and just focus on analysis. Read on for a couple of quick demos.
Get started with ogr_fdw
Ogr_fdw is a Postgres extension. In order to install it, you have to download your Linux distro package, or use StackBuilder to add it to your Postgres install on Windows. This is what I ran in Ubuntu 20.04:
sudo apt update
sudo apt install postgresql-13-ogr-fdw
You'll also want to make sure you've enabled the PostGIS extension to take advantage of spatial functions and filtering. Otherwise, spatial data will be in bytea format (representing Well Known Binary [WKB] geometries).
Like most extensions, you can enable ogr_fdw
with the CREATE EXTENSION
command:
test=# CREATE EXTENSION ogr_fdw;
Then, you can add the foreign server, then create the foreign schema. Let's follow along with the instructions from the ogr_fdw GitHub repo, then look at a more involved example afterwards.
Example: Shapefile saved locally
test=# CREATE SERVER myserver FOREIGN DATA WRAPPER ogr_fdw OPTIONS (datasource '/tmp/data', format 'ESRI Shapefile');
test=# IMPORT FOREIGN SCHEMA ogr_all
LIMIT TO (pt_two)
FROM SERVER myserver INTO public;
First, we set our data source to the directory that contains the shapefile to query. ogr_fdw works with the full list of GDAL/OGR formats.
IMPORT FOREIGN SCHEMA is available starting from Postgres 9.5 and allows you to automatically create foreign tables for the foreign server. ogr_all
is a "special" schema - ogr_fdw will by default create foreign tables for all layers detected in the data source. Here, we're saying that we only want the layer named pt_two to be created as a foreign table. You can use the LIMIT TO
/ EXCLUDE
Postgres clauses if you want only specific layers. Or, you can use quotes around the remote schema string like so:
IMPORT FOREIGN SCHEMA "pt"
FROM SERVER myserver INTO public;
This will create tables for matching layers only (ie names prefixed with pt).
Alternatively, you could use CREATE FOREIGN TABLE
to manually create individual tables:
test=# CREATE FOREIGN TABLE pt_two (
fid int,
geom geometry(Point, 4326),
name varchar,
age int,
height real,
birthdate date) SERVER myserver OPTIONS (layer 'pt_two');
The OPTIONS
clause accepts a layer parameter for the name of the layer to query.
Wait, what if we don't know the table definition? Not a problem! ogr_fdw comes with the ogr_fdw_info
utility - you can use it like so:
ogr_fdw_info -s /tmp/data/ -l pt_two
And it'll generate the CREATE SERVER
and CREATE FOREIGN TABLE
statements for that layer. You might prefer to go this route instead.
And now, you can use SQL to query the shapefile!
test=# SELECT * FROM pt_two;
fid | geom | name | age | height | birthdate
-----+----------------------------------------------------+-------+-----+--------+------------
0 | 0101000020E6100000C00497D1162CB93F8CBAEF08A080E63F | Peter | 45 | 5.6 | 1965-04-12
1 | 0101000020E610000054E943ACD697E2BFC0895EE54A46CF3F | Paul | 33 | 5.84 | 1971-03-25
Neat. Now let's try a different example. What if we want to connect to a non-local file?
Example: Zipped shapefile hosted on a private S3 bucket
In this example, I'll query a zip file that contains a shapefile of neighborhood association boundaries within the City of Tampa (Neighborhoods.zip
), hosted in a private S3 bucket named kbatu-example.
If you take a look at the GDAL docs for AWS S3, you'll see that the second authentication method requires a couple of config options to be set. How do you set config options with ogr_fdw? That's done by passing in a config_options
parameter when creating the foreign server. Here's how I've set up the foreign server and table:
tampa=# CREATE SERVER s3_tampa
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource '/vsizip/vsis3/kbatu-example/Neighborhoods.zip',
format 'ESRI Shapefile',
-- Separate multiple config options with space
config_options 'AWS_ACCESS_KEY_ID=access_key_id AWS_SECRET_ACCESS_KEY=my_access_key');
tampa=# IMPORT FOREIGN SCHEMA ogr_all FROM SERVER s3_tampa INTO public;
In psql, I can use these backslash commands to do a quick check on my foreign server and tables:
tampa=# \des
List of foreign servers
Name | Owner | Foreign-data wrapper
------------------+----------+----------------------
s3_tampa | postgres | ogr_fdw
(1 row)
tampa=# \det
List of foreign tables
Schema | Table | Server
--------+---------------+----------
public | neighborhoods | s3_tampa
(1 row)
Query the shapefile: spatial joins and SQL filters
Some months ago, I had started to play around with open data from the City of Tampa. I thought it would be a great way to become more comfortable with PostGIS while getting to know my city a little better. I have Public Art data stored in my database, so why not try a spatial query on it with my new Neighborhoods table?
For instance, in the time I've lived in Tampa, I've seen that many of the public art installations are found in the central downtown area, which itself consists of a few different neighborhoods, depending on whom you ask. Which art installations are located outside of that central downtown area? (I'm using the neighborhood associations dataset, and the AssocLabel field in particular as a proxy.)
Here's my version of a query that answers that question:
tampa=# SELECT
a.title AS artwork_name,
n.assoclabel AS neighborhood_name,
n.zipcode AS neighborhood_zip
FROM tampa_public_art a
JOIN neighborhoods n
ON ST_Contains(n.geom, ST_Transform(a.geom, 4326))
WHERE n.assoclabel NOT IN ('Tampa Downtown Partnership', 'Channel District', 'Downtown River Arts Neighborhood Association', 'Port of Tampa');
I've used a spatial join to form a table of geometries where the public artwork points are found within the neighborhood polygons. Note that I apply ST_Transform to reproject the artwork geometries since the dataset uses the Pseudo-Mercator/EPSG:3857 projection, whereas the neighborhoods data uses WGS84/EPSG:4326. Pseudo-Mercator is more appropriate for web maps or visualizing spatial data, which isn't the crux of what I'm trying to do so I prefer to use WGS84 for this analysis. I'm also using a NOT IN operator to exclude my own (arbitrary) list of neighborhoods that count for the downtown area.
The following image is taken from DBeaver's Spatial Viewer (adding in the neighborhood and artwork geometries to the SELECT
statement):
There are 39 (out of 89) works that are a little bit more dispersed throughout the city, in neighborhoods such as Macfarlane Park in West Tampa, and Sulphur Springs between the zoo and the University of South Florida campus. Now I know to keep an eye out for them if I'm in those areas.
I will also point out that this is a relatively simple query involving small tables, and since we're wrapping around a file and not a database, my query doesn't use any indexes to gain efficiencies. So, ogr_fdw may not be the first tool I reach for when it comes to complex queries on larger datasets. With that said, I'd be happy to have it as an option for more adhoc analyses. The OGR SQL documentation also mentions some query limitations and I'd recommend looking those over as well.
Tell me you're using Postgres, without telling me you're using Postgres
In the above example, we're able to query data in a shapefile without having to:
- Download the zip file
- Extract its contents
- Then load the spatial data into Postgres/PostGIS with
ogr2ogr
orshp2pgsql
Foreign data wrappers really can open up some new possibilities for data analysis with Postgres. I've only scratched the surface here with ogr_fdw, but don't forget to browse the project README for even more examples and tips (such as how to view the logic pushed down from OGR to the source data). And, if you enjoyed this post, you'll probably want to check out some of my colleague Paul Ramsey's ogr_fdw musings as well as appearances on the Crunchy Data blog.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read