Better JSON in Postgres with PostgreSQL 14
Postgres has had "JSON" support for nearly 10 years now. I put JSON
in quotes because well, 10 years ago when we announced JSON support we kinda cheated. We validated JSON was valid and then put it into a standard text field. Two years later in 2014 with Postgres 9.4 we got more proper JSON support with the JSONB
datatype. My colleague @will likes to state that the B stands for better. In Postgres 14, the JSONB support is indeed getting way better.
I'll get into this small but pretty incredible change in Postgres 14 in just a minute, first though it is worth some summary disclaimer on the difference between JSON and JSONB. JSON still exists within Postgres and if you do: CREATE TABLE foo (id serial, mycolumn JSON);
You'll get a JSON datatype. This datatype will ensure you insert valid JSON into it, but will store it as text. This is quite useful if you don't want to index most of the JSON and want to just quickly insert a ton of it (a great example use case for this is recording API/log input where you may want to play requests).
JSONB unlike JSON compresses the data down and does not preserve whitespace. JSONB also comes with some better indexing ability in GIN indexes. While you can index JSON you have to index each path. From here on I'll be using JSON interchangeably, but please in your app mostly use JSONB unless explicitly meaning the more simplistic JSON text format.
Whats new with JSON in 14
I work around a lot of Ruby developers, but personally align more with Python. The clear to read/easy to intuit form of language I appreciate, perhaps because I don't often get to write code so it's easy for me to jump back in. It's largely for that reason I've had a bit of a nagging feeling any time I used JSON in Postgres.
Let's say for example I'm an e-commerce website and I want to have my product catalog in Postgres. In the case of Crunchy My schema might look something like:
CREATE TABLE products (
id serial,
name text,
price numeric(10, 2),
created_at timestamptz,
updated_at timestamptz,
deleted_at timestamptz,
details jsonb
);
I often see JSON datatypes mixed in with other standard datatypes (we do quite a bit of this for internal data structures ourselves for Crunchy Bridge). Within a product catalog is a great example. You may have two very different products with very different, instead of building a table for shirts
and a table for couches
you can have differing details for each. For a shirt you have a size, color, type (meaning short sleeve/long sleeve). For a couch you likely have a width, a height, a type (love seat vs. couch).
In Postgres 13 and early if you wanted to find all medium shirts that are neon yellow you write some query:
SELECT *
FROM shirts
WHERE details->'attributes'->>'color' = 'neon yellow'
AND details->'attributes'->>'size' = 'medium';
That use of ->
and ->>
while I'm familiar with it and recall, also felt painful. You essentially had two different operators, one to traverse the JSON document ->
and then you'd add an extra >
to extract the value as text.
In Postgres 14 though:
SELECT *
FROM shirts
WHERE details['attributes']['color'] = '"neon yellow"'
AND details['attributes']['size'] = '"medium"'
Note the use of the quotation marks: Postgres expects for you to use a JSON-style string when using it as a comparison using the subscripting method.
Want to update a specific record? Just as easy:
UPDATE shirts
SET details['attributes']['color'] = '"neon blue"'
WHERE id = 123;
Sure you can still use ->
and ->>
, but the new subscripting syntax supported in 14 is likely to be your new go to.
Indexing your JSON
In both JSON and JSONB you can index you JSON for faster read times. In the non JSONB format you need to index specific keys to be able to query against them. If you wanted to index on color:
CREATE INDEX idx_products_details ON products ((details->'attributes'->'color'));
And if you wanted to index on size:
CREATE INDEX idx_products_details ON products ((details->'attributes'->'size'));
That can become extremely painful if you have a full product catalog with different attributes. Here is where JSONB
especially shines:
CREATE index idx_json_details ON json_test using gin (details);
Now GIN
indexing will help quite a bit from having to index every various key within your JSON document, but we still have a few things we need to know on how to best search for the time being. At the moment the GIN indexing requires you to still use different operators when querying. So in order to say find all neon yellow shirts you'd have to structure you're query with the @>
operator for a contains:
SELECT *
FROM products
WHERE details @> '{"color": "neon yellow"}';
And with an EXPLAIN ANALYZE
we can see that it uses an index:
EXPLAIN ANALYZE SELECT * FROM products WHERE details @> '{"color": "neon yellow"}';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on products (cost=36.25..159.79 rows=32 width=279) (actual time=0.090..0.090 rows=1 loops=1)
Recheck Cond: (details @> '{"color": "neon yellow"}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_json_details (cost=0.00..36.24 rows=32 width=0) (actual time=0.085..0.085 rows=1 loops=1)
Index Cond: (details @> '{"color": "neon yellow"}'::jsonb)
Planning Time: 0.140 ms
Execution Time: 0.112 ms
(7 rows)
Postgres in JSON, just keeps getting better
Postgres 14 makes JSON even more user friendly than before. While I wouldn't recommend simply using the subscript format everywhere in your application due to it not always leveraging indexes, for casual querying it proves to be a big win. In time there is a good chance we have improvements that help subscripting leverage existing index types. The reality is the future is bright for better JSON in PostgreSQL.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read