How To Correct and Identify Indexes Affected by the GNU C 2.28 Update
Version 2.28 (release notes) of the GNU C library introduces many changes to the collations it provides. Collations determine how strings are compared and by default, PostgreSQL uses the operating system’s collations which on Linux means glibC. When your operating system updates to this version of glibc and you aren't using the “C” or “POSIX” collation, you may encounter some differently ordered indexes. This unexpected change in the order of indexes will lead to incorrectly ordered query results and possible data corruption. Currently, the following distributions are affected:
- Ubuntu 18.10 (cosmic)
- RHEL/CentOS 8
- Debian 10 (buster)
How can a system update cause index corruption?
By default, PostgreSQL uses the operating systems collations (as provided by the GNU C library). Since collations determine how strings are compared, changing them can affect the result of the ORDER BY clause in SELECT statements and the order of the keys in B-tree indexes. Changes to the behavior of ORDER BY can be annoying, but changing the order of an index can lead to incorrect query results and duplicate entries.
You may also see rows being stored in the wrong partition if using columns of type text, varchar, char, or citext in the partition key on range-partitioned tables.
What indexes are affected?
Only indexes involving columns of type text, varchar, char, or citex are affected. Databases or table columns using the “C” or “POSIX” locales and table columns using collations with the ICU provider are not affected.
You can identify affected indexes in all databases with the following query:
SELECT indrelid::regclass::text, indexrelid::regclass::text, collname, pg_get_indexdef(indexrelid)
FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index, generate_subscripts(indcollation, 1) g(i)) s
JOIN pg_collation c ON coll=c.oid
WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX');
How can I correct this?
The simplest method is to rebuild the indexes by running REINDEX index_name
for all affected indexes. However, this will block all access to the index and prevent DDL on the table during the process. Starting with PostgreSQL version 12 (to be released this fall), you can use REINDEX CONCURRENTLY to avoid these limitations. For older versions of PostgreSQL you can work around these limitations by creating a copy of the index using CREATE INDEX CONCURRENTLY
, then dropping the original index as shown below:
CREATE INDEX CONCURRENTLY myindex2 ON mytable (textcol);
DROP INDEX myindex;
ALTER INDEX myindex2 RENAME TO myindex;
It should be noted that the above will not work for primary keys. Additionally sufficient storage for both indexes will be needed for the duration of the process described above.
You can find more information about the locale data changes here: https://wiki.postgresql.org/wiki/Locale_data_changes
and the Postgres documentation on managing collations here: https://www.postgresql.org/docs/10/collation.html#COLLATION-MANAGING
Related Articles
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read
- Smarter Postgres LLM with Retrieval Augmented Generation
6 min read
- Postgres Partitioning with a Default Partition
16 min read