Geocoding with Web APIs in Postgres
Geocoding is the process of taking addresses or location information and getting the coordinates for that location. Anytime you route a new location or look up a zip code, the back end is geocoding the location and then using the geocode inside other PostGIS functions to give you the routes, locations, and other data you asked for.
PostGIS comes equipped with an easy way to use the US Census data with the Tiger geocoder. Using the Tiger geocoder requires downloading large amounts of census data and in space-limited databases, this may not be ideal. Using a geocoding web API service can be a space saving solution in these cases.
I am going to show you how to set up a really quick function using plpython3u
to hit a web service geocoder every time that we get a new row in the database.
Installing plpython3u
The plpython3u extension comes with Crunchy Bridge or you can add it to your database. To get started run the following:
CREATE EXTENSION plpython3u;
Creating a function to geocode addresses
In this example, I'll use the US census geocoding API as our web service, and build a function to geocode addresses based on that.
The function puts together parameters to hit the census geocoding API and then parses the resulting object, and returns a geometry:
CREATE OR REPLACE FUNCTION geocode(address text)
RETURNS geometry
AS $$
import requests
try:
payload = {'address' : address , 'benchmark' : 2020, 'format' : 'json'}
base_geocode = 'https://geocoding.geo.census.gov/geocoder/locations/onelineaddress'
r = requests.get(base_geocode, params = payload)
coords = r.json()['result']['addressMatches'][0]['coordinates']
lon = coords['x']
lat = coords['y']
geom = f'SRID=4326;POINT({lon} {lat})'
except Exception as e:
plpy.notice(f'address failed: {address}')
plpy.notice(f'error: {e.message}')
geom = None
return geom
$$
LANGUAGE 'plpython3u';
Using this function to geocode Crunchy Data's headquarters:
SELECT ST_AsText(geocode('162 Seven Farms Drive Charleston, SC 29492'));
Deploying this function for new data
But what if we want to automatically run this every time an address is inserted into a table? Let's say we have a table with a field ID, an address, and a point that we want to auto-populate on inserts.
CREATE TABLE addresses (
fid SERIAL PRIMARY KEY,
address VARCHAR,
geom GEOMETRY(POINT, 4326)
);
We can make use of a Postgres trigger to add the geocode before every insert! Triggers are a very powerful way to leverage built in functions to automatically transform your data as it enters the database, and this particular case is a great demo for them!
CREATE OR REPLACE FUNCTION add_geocode()
RETURNS trigger AS
$$
DECLARE
loc geometry;
BEGIN
loc := geocode(NEW.address);
NEW.geom = loc;
RETURN NEW;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER update_geocode BEFORE INSERT ON addresses
FOR EACH ROW EXECUTE FUNCTION add_geocode();
Now when running an insert, the value is automatically geocoded!
INSERT INTO addresses(address) VALUES ('415 Mission St, San Francisco, CA 94105');
postgres=# SELECT fid, address, ST_AsText(geom) FROM addresses;
fid | address | geom
-----+-----------------------------------------+----------------------------------------------------
1 | 415 Mission St, San Francisco, CA 94105 | 0101000020E610000097CD0E2B66995EC0BB004B2729E54240
Summary
If you’re space limited, using a web API based geocoder might be the way to go. Using a geocoder function with triggers on new row inserts will get you geocoded addresses in a snap.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read