JSON Updates in Postgres 16
Postgres has been steadily building on the JSON functionality initially released more than 10 years ago. With Postgres 16, working with JSON has gotten a couple nice improvements. Primarily, this release added features that ease the manipulation of data into JSON and improve the standard SQL functionality using JSON.
TL;DR:
- A SQL/JSON data-type check. For instance, this lets you ask with SQL if something
value IS JSON OBJECT
- Addition of SQL-standard JSON functions:
JSON_ARRAY()
,JSON_ARRAYAGG()
,JSON_OBJECT()
, andJSON_OBJECTAGG()
Data for this post
For a quick tutorial, use the following dataset:
CREATE TABLE user_profiles ( id serial PRIMARY KEY, name text NOT NULL, details jsonb );
INSERT INTO
user_profiles (name, details)
VALUES
(
'John Doe', '{"age": 30, "address": {"city": "New York", "state": "NY"}, "hobbies": ["reading", "hiking"]}'::jsonb
)
,
(
'Jane Smith', '{"age": 25, "address": {"city": "Los Angeles", "state": "CA"}, "hobbies": ["painting", "cooking"]}'::jsonb
)
,
(
'Emily Davis', '{"age": 29, "address": {"city": "Chicago", "state": "IL"}, "hobbies": ["dancing", "writing"]}'::jsonb
)
;
IS JSON
Previously, to test if a field was JSON, your options were to run pg_typeof
:
SELECT
details,
pg_typeof(details),
pg_typeof(details ->> 'address')
FROM
user_profiles LIMIT 1;
Which would return jsonb
for that second column, and text
for the 3rd column. The problem with this is that it didn’t give you any inspection into the nested values within JSON, so running pg_typeof(details->>'address')
would just tell you it is text
. Now, we can do something like:
SELECT
details,
details IS JSON,
details ->> 'address' IS JSON OBJECT
FROM
user_profiles LIMIT 1;
Using this new functionality, the IS JSON
returns true and the IS JSON OBJECT
returns true as well. Previously, when building queries to inspect JSON, you were at risk of failed queries if the data’s JSON format did not match expected values. Imagine you are migrating your JSON structure, or you previously didn’t have a well defined schema, so let’s throw in some bad details data that mismatches our current structure:
INSERT INTO
user_profiles (name, details)
VALUES
(
'Bruce Wayne', '"Gotham City, NY"'
)
;
INSERT INTO
user_profiles (name, details)
VALUES
(
'Clark J. Kent', '{"age": 32, "address": [{"city": "New York", "state": "AL"}]}'
)
;
Both of the values provided above are valid JSON, yet the existing structure was of the format {"addresses": []}
. Now, when extracting the first address for a field, it would look like this:
SELECT
CASE
WHEN
details -> 'address' IS JSON ARRAY
THEN
(details -> 'address')[0]
WHEN
details -> 'address' IS JSON OBJECT
THEN
details -> 'address'
WHEN
details IS JSON SCALAR
THEN
details
END
AS primary_address
FROM
user_profiles;
Additionally, because JSON is loosely validated, you can inspect UNIQUE keys or WITHOUT UNIQUE keys:
SELECT
'{"batman": "robin", "batman": "robin"}' IS JSON WITH UNIQUE KEYS,
'{"batman": "robin", "batman": "robin"}' IS JSON WITHOUT UNIQUE KEYS;
JSON_ARRAY
& JSON_ARRAYAGG
Now, we can interact with JSON in a more-standards compliant way. The json
and jsonb
constructs are a Postgres implementation, but not a SQL implementation. Thus, the existing Postgres functions were prefixed with jsonb_
or json
. These new functions get away from those prefixes, and use keywords to specify output.
Let’s look at json_array
. If you have previously done this, you used json_build_array
/ jsonb_build_array
, or the aggregate functions json_agg
/ jsonb_agg
. Now, you can combine values using json_array
:
SELECT
json_array(name, details ->> 'age')
FROM
user_profiles;
Or, use it to build an array on a GROUP BY
using json_arrayagg
. Below, we write a SQL query that returns a JSON array of each person within each age group.
SELECT
((details ->> 'age')::integer / 10) * 10 AS age_group,
json_arrayagg(name)
FROM
user_profiles
GROUP BY
1;
We can also use json_arrayagg
in window functions. Below, we use a window function to find the other group members for each age group.
SELECT
name,
(
(details ->> 'age')::integer / 10
)
* 10 AS age_group,
json_arrayagg(name) OVER (PARTITION BY (details ->> 'age')::integer / 10 * 10) AS other_group_members
FROM
user_profiles;
Disclaimer: I would write this differently using a dedicated CTE and GROUP BY
, but this is an example of using json_arrayagg
as a window function using the small dataset.
JSON_OBJECTAGG
Just as we can aggregate values into an array, we can also aggregate values into an object. Previously, when using json(b)?_build_object
or json(b)?_object_agg
to construct objects. With Postgres 16, we have a standard SQL function called json_object
with a syntax that moved away from tuples, and uses either value
or :
to separate key-value pairs.
SELECT
json_object('name' value name, 'age': details ->> 'age')
FROM
user_profiles;
The aggregations form is as follows:
SELECT
((details ->> 'age')::integer / 10) * 10 AS age_group,
json_objectagg(name value details ->> 'age')
FROM
user_profiles
GROUP BY
1;
As with the json_arrayagg
, the json_objectagg
can be used in window functions:
SELECT
name,
(
(details ->> 'age')::integer / 10
)
* 10 AS age_group,
json_objectagg(name value details ->> 'age') OVER (PARTITION BY (details ->> 'age')::integer / 10 * 10) AS other_group_members
FROM
user_profiles;
Modifying behavior with keywords
For both json_array
and json_object
, they come with modifying keywords. json_array
gets ABSENT
and RETURNING
. json_object
gets ABSENT
, UNIQUE
, and RETURNING
.
UNIQUE
Unique constraint on json_object
will throw errors on duplicate keys, but by default, it will not error. Try the following:
SELECT
json_object('key_1' value 'value_1', 'key_1' value 'value_2' WITH UNIQUE);
SELECT
json_object('key_1' value 'value_1', 'key_1' value 'value_2');
ABSENT ON NULL
v. NULL ON NULL
Absent, or more precisely ABSENT ON NULL
constraint instructs json_object
and json_array
to omit null values:
SELECT
json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' ABSENT
ON NULL);
SELECT
json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' NULL
ON NULL);
Absent also works on json_array
:
SELECT
json_array('value_1', NULL, 'value_3' ABSENT
ON NULL);
SELECT
json_array('value_1', NULL, 'value_3' NULL
ON NULL);
The opposing functionality of ABSENT ON NULL
is the NULL ON NULL
as above. This means the following is a valid SQL statement in Postgres 16:
SELECT
json_array(NULL NULL
ON NULL);
RETURNING
Have you wondered why we didn’t have a jsonb
variant of this json_object
function? It’s because the output is dictated by a keyword instead of the function name. Using RETURNING
to specify the returned object type:
SELECT
json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' RETURNING jsonb);
SELECT
json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' RETURNING json);
SELECT
json_array('value_1', NULL, 'value_3' RETURNING jsonb);
SELECT
json_array('value_1', NULL, 'value_3' RETURNING json);
Using the pg_typeof
function, you can see the different data types that are returned by each statement above.
SELECT
pg_typeof(json_array('value_1', NULL, 'value_3' RETURNING jsonb));
Below is an example of using all 3 of the keywords together:
SELECT json_object(
'key_1' value 'value_1',
'key_2' value NULL,
'key_3' value 'value_3'
ABSENT ON NULL WITH UNIQUE RETURNING jsonb);
This functionality is added to the existing functions as suffixes. The function json_object_agg
received a strict
, unique
, and a unique_strict
suffix as follows:
json_object_agg_strict
/jsonb_object_agg_strict
json_object_agg_unique
/jsonb_object_agg_unique
json_object_agg_unique_strict
/jsonb_object_agg_unique_strict
For instance, using jsonb_object_agg_unique_strict
would look like the following:
SELECT
json_object_agg_unique_strict(key, value)
FROM (VALUES
('key_1', 'value_1'),
('key_2', NULL),
('key_3', 'value_3')
) AS t(key, value);
The unique
suffix specifies to throw an error on duplicate keys, and strict
suffix skips null values. Of course, combining unique_strict
suffixes specifies both. Given that I appreciate Postgres’ pursuit of the SQL standards, I would lean toward using new JSON_OBJECT
and JSON_ARRAY
functions + their keywords.
Postgres is a first-class JSON database
As we’ve said before, Postgres 16 is a first-class database for JSON (read Craig’s post on the evolution of JSON and JSONB). Given JSON was first introduced in 9.4 and now we are on version 16, the continued rollout of JSON improvement means the capabilities are thriving in the Postgres ecosystem. We know lots of folks are using JSON in Postgres and we’re happy to see several hands on and under the hood improvements.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read