Postgres Query Boost: Using ANY Instead of IN
The IN
operator has to be in the top-3 SQL-specific operators that people learn, the close competitors are LIKE
and BETWEEN
. Thus, IN
feels familiar when crafting SQL: you can use it with nested SQL statements, or create your own list. But, it does have limitations — what if you wanted to send a list of unknown length? You can’t really use SQL placeholders without also modifying the SQL statement.
What if I told you there was a less ubiquitous operator that offered all of the power and less of the drawbacks. In this article we'll talk about using = ANY(array)
in instead of IN(list)
to improve code and query quality.
We don't give Java enough love in our tutorials, so these examples are written for JDK with the Vert.x reactive Postgres library. Don’t worry, you’ll be able to read it. We chose a direct Postgres client because ORMs typically default to IN(list)
, and I’m making an argument they should convert to = ANY(array)
where applicable.
Unforeseenn Consequences
It starts simply enough: I have a collection of items that I want to filter by id
. Perhaps, this is a user-selectable form that permits users to multi-select values. The most common query looks like this:
SELECT i.*
FROM item AS i
JOIN shipment AS s ON s.item_id=i.id
WHERE s.status <> 'shipped' -- Grab only Unshipped items
AND i.id IN (1,2,3,4,5,6);
Seems perfectly reasonable. I'll just copy that query into our code and replace the literal list with a placeholder right?
Introduce Complexity
Back in the code window, I drop it into some code approximately like this.
// Prepare Tuple of IDs
Tuple t_list = Tuple.of(1, 2, 3, 4, 5, 6);
// Prepare a parameterized query
pool.preparedQuery("SELECT i.*\\n" +
"FROM item AS i\\n" +
"JOIN shipment AS s ON s.item_id=i.id\\n" +
"WHERE s.status <> 'shipped'\\n" +
" AND i.id IN ($1)")
// Attempt to run it
.execute(t_list)
// Log the IDs returned
.onSuccess(rows -> rows.forEach(
row -> logger.info("id: " + row.getInteger("id"))))
// Failure: Log the complaint
.onFailure(t -> logger.error("QUERY FAIL", t));
But this doesn’t work. It will throw an exception about the wrong number of parameters. The $1
placeholder expects a single value, but it receives 6 values. Aha, that Tuple
must be packed wrong for the IN
parameter. Maybe sending an array is better.
// Prepare Tuple of array of IDs
Integer[] items = {1, 2, 3, 4, 5, 6};
Tuple t_list = Tuple.of(items);
// Prepare a parameterized query
pool.preparedQuery("SELECT i.*\\n" +
"FROM item AS i\\n" +
"JOIN shipment AS s ON s.item_id=i.id\\n" +
"WHERE s.status <> 'shipped'\\n" +
" AND i.id IN ($1)")
// Attempt to run it
.execute(t_list)
// Log the IDs returned
.onSuccess(rows -> rows.forEach(
row -> logger.info("id: " + row.getInteger("id"))))
// Failure: Log the complaint
.onFailure(t -> logger.error("QUERY FAIL", t));
This doesn’t work either. It throws a wrong type error. Because Postgres is strongly typed, when comparing id
SQL expects a Number
, but gets Integer[]
.
Time to Read the Fine Manual
The exceptions coming back from these attempts are not getting better. Let's head over to the PostgreSQL 15 documentation to see if anything leaps out, regarding this.
9.24.1
IN
The right-hand side is a parenthesized list of scalar expressions.
How do I send a parenthesized list of scalars? Is that even possible?
Well, I tried sending an array and I tried sending multiple scalars, but none of those matched a parenthesized list of scalar expressions. As it turns out, you can't prepare a variable list of scalar expressions. If you mean to send 5 scalars, you must write IN ($1, $2, $3, $4, $5)
. This operator is finicky.
The case for ANY
(or SOME
)
Fortunately the solution is easily found on the same PostgreSQL 15 documentation page if I continue scrolling.
9.24.3.
ANY
/SOME
(array)expression operator ANY (array expression) expression operator SOME (array expression)
The right-hand side is a parenthesized expression, which must yield an array value.
Possible match? From the definitions, expression IN (...)
is equivalent to expression = ANY (...)
but for the parameter type!
Endgame
Let's give it a shot.
// Prepare Tuple of array of IDs
Integer[] items = {1, 2, 3, 4, 5, 6};
Tuple t_list = Tuple.of(items);
// Prepare a parameterized query
// this time replacing IN with =ANY
pool.preparedQuery("SELECT i.*\\n" +
"FROM item AS i\\n" +
"JOIN shipment AS s ON s.item_id=i.id\\n" +
"WHERE s.status <> 'shipped'\\n" +
" AND i.id = ANY($1)")
// Attempt to run it
.execute(t_list)
// Log the IDs returned
.onSuccess(rows -> rows.forEach(
row -> logger.info("id: " + row.getInteger("id") + " name: " + row.getString("name"))))
// Failure: Log the complaint
.onFailure(t -> logger.error("QUERY FAIL", t));
Success! My array binds as a single parameter to ANY
.
Conclusions
Given the constraints on IN
and the need to send parameters securely, it makes sense use = ANY
where you might otherwise use IN
.
In addition to parameter binding, it's worth mentioning that ANY
works with all boolean comparison operators:
* foo LIKE ANY ('{"%bar", "%baz"}')
* foo ILIKE ANY ('{"%bar", "%baz"}')
* id <> ANY ('{1, 2, 3}')
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read