Demystifying Schemas & search_path through Examples
On March 1, 2018, the PostgreSQL community released version 10.3 and other supported versions of PostgreSQL. The release centered around a disclosed security vulnerability designated CVE-2018-1058, which is related to how a user can accidentally or maliciously "create like-named objects in different schemas that can change the behavior of other users' queries."
The PostgreSQL community released a guide around what exactly CVE-2018-1058 is and how to protect your databases. However, we thought it would also be helpful to look into what schemas are in PostgreSQL, how they are used under normal operations, and how to investigate your schemas to look for and eliminate suspicious functions.
The examples below are all via the command-line. You will need to have at least two terminal windows available to make connections to your database. Certain operations, such as inspecting what schemas and functions are in your database, are much easier to perform with a graphical tool like pgAdmin4.
Setup
In this example, there are three user accounts:
- A database superuser. In my database, the superuser is named “jkatz” and already exists
- alice
- bob
You can create alice and bob using the “createuser” command which comes with a PostgreSQL installation:
createuser alice
createuser bob
If you create alice and bob some other way, please ensure that they do not have any special status, i.e. they:
- Cannot create other databases
- Cannot create other users
- Are not superusers themselves
You may opt to set a password, but the examples below do not require one.
Also in the examples below, we will be distinguishing between who is executing the queries.
- Queries highlighted in blue are executed by alice
- Queries highlighted in orange are executed by bob
- Queries highlighted in red are executed by the superuser. In the examples below, the superuser is named "jkatz"
If you ever get lost as to which user is logged into which connection, you can run the following code to determine who you are currently logged in as:
SELECT SESSION_USER;
We will be using PostgreSQL 10 in this example. If you are on an older version of PostgreSQL, if you see a phrase like:
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY
you can replace it with:
id serial PRIMARY KEY
Finally, to complete the setup, create a new database called theworld
that is owned by alice:
createdb -O alice theworld
In two separate terminal windows, connect to theworld
as alice and bob respectively:
psql -U alice theworld
psql -U bob theworld
What are Schemas? Before defining what a schema is, we need to define what an object is in PostgreSQL. PostgreSQL is more than just a relational database: it was designed to be an object-relational database, and within PostgreSQL there exist many kinds of objects:
- Data types
- Tables
- Functions
- Operators
- Views
- Triggers
and more.
Schemas allow users to namespace and group objects together within a database. For instance, with schemas you can do the following:
- Group specific parts of your application together, e.g.
accounts
in one part,payments
in another - Keep 3rd-party applications or foreign data tables separate from your primary application to avoid naming conflicts
- Isolate users to specific schemas so their work does not conflict with each other
By default, a schema named public
is created in every new PostgreSQL database. To see this, as alice run the following command, which lists all schemas in a database:
\dn
You should see output like this:
List of schemas
Name | Owner
--------+-------
public | jkatz
Also by default in PostgreSQL, any user who can connect to a database can create new objects to the public schema. This will be explored in depth in the following examples.
Exploring the “public” schema
In many applications, you are working with the public schema implicitly without realizing it. For instance, as alice, try creating a table:
CREATE TABLE vegetables (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
vegetable_name varchar(255) NOT NULL UNIQUE
);
Now get a list of all tables in the database by using the following command:
\dt
which should return:
List of relations
Schema | Name | Type | Owner
--------+------------+-------+-------
public | vegetables | table | alice
The result: alice created a table named vegetables
in the public
schema of theworld
database. We could have also executed:
CREATE TABLE public.vegetables (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
vegetable_name varchar(255) NOT NULL UNIQUE
);
and generated the same result, but we did not have to. Why?
The search path
PostgreSQL knows what schema it is currently in based on the schema search path. When PostgreSQL is given an unqualified object name (more on that in a second), it uses the search path to determine what order it should search the schemas to find objects. You can use the command SHOW search_path;
to view your current search path, which if you run that as alice you will see:
search_path
-----------------
"$user", public
which is the default. "$user"
is a special option that says if there is a schema that matches the current user (vis-à-vis SELECT SESSION_USER;
), then search within that schema. If there are entries in the search path for nonexistent schemas, then they are ignored.
You can set the "search_path
" per-transaction, per-session, or as a database configuration setting.
Using the default settings, if you execute this query as alice:
INSERT INTO vegetables (vegetable_name) VALUES ('asparagus'), ('broccoli'), ('eggplant');
PostgreSQL searches each schema to see where a table called vegetables
exists. As there is no schema named alice, PostgreSQL then searches the public
schema and performs the INSERT
.
Quick Digression: Qualified Object Names
A qualified object name is considered to be of the format schema.table
, or in the case of the above example:
public.vegetables
You can also write a qualified object name in the database.schema.table
format, or in the case of the above example:
theworld.public.vegetables
This might lead you to think: does PostgreSQL support cross-database queries outside of the box? The short answer is: no. Try it out! Open a new terminal window and run the following:
createdb -O alice newworld
psql -U alice newworld -c "SELECT * FROM theworld.public.vegetables;"
ERROR: cross-database references are not implemented: "theworld.public.vegetables"
*What about "bob?" Multiple users and the public
schema
Recall that bob also has access to the database theworld
and decides that he wants to see all of the tables in the database. As bob run:
\dt
and you should see:
List of relations
Schema | Name | Type | Owner
--------+------------+-------+-------
public | vegetables | table | alice
So bob is able to see that alice owns a table called vegetables
within the public
schema. As bob try looking at the contents inside of vegetables
:
SELECT * FROM vegetables;
You should receive this error:
ERROR: permission denied for relation vegetables
Even though bob has access to the public
schema, it does not mean he can interact with every object that has been created in it. In this case, the owner of the vegetables
table, alice, did not GRANT any permissions to bob and thus he cannot view the content.
bob can create his own objects within the public
schema. As bob run the following query:
CREATE TABLE fruits (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
fruit_name varchar(255) NOT NULL UNIQUE
);
First, as bob, look at the updated list of tables:
\dt
List of relations
Schema | Name | Type | Owner
--------+------------+-------+-------
public | fruits | table | bob
public | vegetables | table | alice
As bob, try inserting and retrieving some last names:
INSERT INTO fruits (fruit_name) VALUES ('apple'), ('banana'), ('pear');
SELECT * FROM fruits;
id | fruit_name
----+------------
1 | apple
2 | banana
3 | pear
What does alice see? As alice, first look at what tables are available:
\dt
List of relations
Schema | Name | Type | Owner
--------+------------+-------+-------
public | fruits | table | bob
public | vegetables | table | alice
Now as alice, try to retrieve a list of fruits from bob's table:
SELECT * FROM fruits;
ERROR: permission denied for relation fruits
bob did not give permission to alice to view his table, and thus alice cannot run a SELECT
query on his table.
Functions and the public
schema
PostgreSQL allows users to create functions, so it's a good idea to see how these work within the public
schema. As alice, let's create a simple function that cubes integers:
CREATE OR REPLACE FUNCTION cube(int)
RETURNS int
AS $$
SELECT $1 * $1 * $1;
$$ LANGUAGE SQL IMMUTABLE;
Because our search path has not been updated, this function is created in the public
schema. You can view all the functions in the theworld
database with the following command:
\df
As alice, if you run the following query:
SELECT cube(3);
you will see the following result:
cube
------
27
Now, as bob run the same query:
SELECT cube(3);
You will see the following result:
cube
------
27
By default, when a function is created, the function is executable by all users that have "USAGE" permission in that schema. However, this does not necessarily mean that the functions will execute properly due to the underlying permissions of objects being referenced in said functions. For example, alice decides to create a set returning function that will return a list that reverse all the names in the vegetables
table:
CREATE OR REPLACE FUNCTION reverse_all()
RETURNS TABLE(vegetable_name text)
AS $$
SELECT reverse(vegetable_name)
FROM vegetables;
$$ LANGUAGE SQL STABLE;
When alice executes:
SELECT * FROM reverse_all();
She will see:
vegetable_name
----------------
sugarapsa
iloccorb
tnalpgge
When bob tries to execute the same function:
SELECT * FROM reverse_all();
He will see:
ERROR: permission denied for relation vegetables
Even though bob can execute reverse_all
, unless he has permission to interact with the underlying table in the function, he cannot view or manipulate the results.
You will also notice that the reverse_all
function contains a function named reverse but it does not have a qualified object-name. reverse
and other built-in PostgreSQL functions exist in the pg_catalog
schema. By default, PostgreSQL prepends the pg_catalog
schema implicitly to the search_path
and looks for all objects in there first. To see a list of functions that are in the pg_catalog, you can use the following command:
\df pg_catalog.*
Creating & Managing Schemas
Creating a $user
Schema
alice decides she wants to create her own schema:
CREATE SCHEMA alice;
The default search_path
rule means that all new objects with unqualified names that alice creates will now be created in the alice schema.
As alice, let's create a table called fruits
:
CREATE TABLE fruits (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
fruit_name varchar(255) NOT NULL UNIQUE
);
Now as alice, retrieve a list of tables:
\dt
List of relations
Schema | Name | Type | Owner
--------+------------+-------+-------
alice | fruits | table | alice
public | vegetables | table | alice
You will notice that the table named fruits
in public
is no longer showing. Based on the search path, because the alice schema takes precedence, the similarly-named fruits
table in the alice schema will appear.
If you want to see all of the tables in the public
schema, you can run:
\dt public.*
Recall earlier that when alice ran SELECT * FROM fruits;
she received an error indicating that she did not have access to the fruits
table. Now, when you run the below as alice:
INSERT INTO fruits (fruit_name) VALUES ('grape'), ('kiwi'), ('orange');
SELECT * FROM fruits;
You will see
id | fruit_name
----+------------
1 | grape
2 | kiwi
3 | orange
as based on alice's search path, alice.fruits
is found before public.fruits
How does this affect bob?
As bob first get a list of schemas:
\dn
which returns something similar to:
List of schemas
Name | Owner
--------+-------
alice | alice
public | jkatz
So bob can see that alice has her own schema now. By default, bob can also see what tables alice has in her schema:
\dt alice.*
which returns:
List of relations
Schema | Name | Type | Owner
--------+--------+-------+-------
alice | fruits | table | alice
Also by default, bob can also inspect the fruits table structure:
\d alice.fruits
which returns:
Table alice.fruits
Column | Type | Collation | Nullable | Default
------------+------------------------+-----------+----------+----------------------------------
id | integer | | not null | generated by default as identity
fruit_name | character varying(255) | | not null |
Indexes:
"fruits_pkey" PRIMARY KEY, btree (id)
When bob tries to execute the below query:
SELECT * FROM fruits;
he will still see the results from the public.fruits
table:
id | fruit_name
----+------------
1 | apple
2 | banana
3 | pear
Schemas Outside of the Search Path
alice decides that she wants to create a new schema called animals
:
CREATE SCHEMA animals;
alice also decides she does not want to add this schema to the search path.
alice then creates a table in animals
called fish
and populates it like so:
CREATE TABLE animals.fish (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
fish_name varchar(255)
);
INSERT INTO animals.fish (fish_name) VALUES ('trout'), ('grouper'), ('salmon');
If alice tries to run the following query:
SELECT * FROM fish;
She will receive the following error:
ERROR: relation `fish` does not exist
As animals.fish
is outside of the search path.
Instead, alice needs to run the query with the qualified object name:
SELECT * FROM animals.fish;
which produces:
id | fish_name
----+-----------
1 | trout
2 | grouper
3 | salmon
If bob tries to run the above query:
SELECT * FROM animals.fish;
He will see this error:
ERROR: permission denied for schema animals
alice decides she wants to share the tables of the animals
schema with bob, so she grants him access to read data from all the tables in the schema:
GRANT SELECT ON ALL TABLES IN SCHEMA animals TO bob;
However, when bob tries to run the above query:
SELECT * FROM animals.fish;
He will see this error:
ERROR: permission denied for schema animals
That is because alice needs to give bob the schema USAGE
permission on animals
, which she can do with the following command:
GRANT USAGE ON SCHEMA animals TO bob;
bob can then run:
SELECT * FROM animals.fish;
and see:
id | fish_name
----+-----------
1 | trout
2 | grouper
3 | salmon
Functions in Schemas Outside of the Search Path
Let's have bob create a function called animals.reverse_all
that is similar to the function defined earlier in this post. As bob try running:
CREATE FUNCTION animals.reverse_all()
RETURNS TABLE(fish_name text)
AS $$
SELECT reverse(fish_name) FROM fish;
$$ LANGUAGE SQL STABLE;
You should receive the following error:
ERROR: permission denied for schema animals
While alice granted permission for bob to use the animals
schema, she did not give him permission to CREATE
objects in the animals
schema. As alice execute the following to allow bob to create objects in animals
:
GRANT CREATE ON SCHEMA animals TO bob;
Now, when bob tries to create the aforementioned function:
CREATE FUNCTION animals.reverse_all()
RETURNS TABLE(fish_name text)
AS $$
SELECT reverse(fish_name) FROM fish;
$$ LANGUAGE SQL STABLE;
he receives the following error:
ERROR: relation `fish` does not exist
Recall that the animals
is not in the search path. There are two ways bob can address this:
Method #1: SET the search path for the function
We can explicitly set the search path for the function like so:
CREATE FUNCTION animals.reverse_all()
RETURNS TABLE(fish_name text)
AS $$
SELECT reverse(fish_name) FROM fish;
$$ LANGUAGE SQL STABLE
SET search_path = 'animals';
Method #2: Use the Qualified Object Name
CREATE FUNCTION animals.reverse_all()
RETURNS TABLE(fish_name text)
AS $$
SELECT reverse(fish_name) FROM animals.fish;
$$ LANGUAGE SQL STABLE;
You can now verify that both bob and alice can execute SELECT * FROM animals.reverse_all();
Explaining CVE-2018-1058: What it is, HOW TO AUDIT & PROTECT YOUR SCHEMA
Recall that in our current schema setup, bob is able to see that alice has a table named fruits
in her schema, but is not able to see the contents of alice.fruits
. However, bob becomes a little too curious, and decides to create a function to try and capture what alice has in alice.fruits
.
bob creates a function in public
called reverse
that looks just like the pg_catalog.reverse
function: it will accept a string of type varchar
and return it in reverse order. However, he adds a couple of lines of code to secretly capture the value into his own table.
GRANT SELECT, INSERT ON public.fruits TO alice;
CREATE OR REPLACE FUNCTION public.reverse(varchar)
RETURNS text
AS $$
WITH pwn AS (
INSERT INTO public.fruits (fruit_name) VALUES ($1)
ON CONFLICT DO NOTHING
RETURNING fruit_name
)
SELECT pg_catalog.reverse($1)
$$ LANGUAGE SQL;
If alice calls the reverse
without using name qualification (i.e. pg_catalog.reverse
), this will give bob his desired results because:
- bob made sure that if alice executes this query, it will not fail, and it will returns the result she expects.
- bob gamed the search path: even though the
pg_catalog
will be searched for matching objects before all other schemas, bob realized that if he made a function that matches slightly better than thepg_catalog
version, then he would be able to get this function to execute.
The next time alice executes:
SELECT reverse(fruit_name) FROM fruits;
she will see:
reverse
---------
eparg
iwik
egnaro
which is the expected output. However, she will have executed public.reverse
instead of pg_catalog.reverse
.
Meanwhile, the next time bob runs the below query after alice calls the reverse
function:
SELECT * FROM fruits;
he will see:
id | fruit_name
----+------------
1 | apple
2 | banana
3 | pear
4 | grape
5 | kiwi
6 | orange
and thus will have successfully caused alice to leak data to him.
Protecting the Search Path
alice decides she wants to investigate what is going on in her database, especially after reading through the CVE-2018-1058 guide. First, as alice let's see what functions are in the public
schema:
\df public.*
which yields:
Schema | Name | Result data type | Argument data types | Type
--------+-------------+----------------------------+---------------------+--------
public | cube | integer | integer | normal
public | reverse | text | character varying | normal
public | reverse_all | TABLE(vegetable_name text) | | normal
The reverse
function should immediately raise a flag for alice as that appears to match a pg_catalog
function. alice can inspect the contents of the public.reverse
function with the following command:
\sf public.reverse
which yields:
CREATE OR REPLACE FUNCTION public.reverse(character varying)
RETURNS text
LANGUAGE sql
AS $function$
WITH pwn AS (
INSERT INTO public.fruits (fruit_name) VALUES ($1)
ON CONFLICT DO NOTHING
RETURNING fruit_name
)
SELECT pg_catalog.reverse($1)
$function$
Thus alice can see the trap code that bob has placed. To remove the trap, alice tries to run the following:
DROP FUNCTION public.reverse(varchar);
but receives the following message:
ERROR: must be owner of function public.reverse
Even though alice is the owner of the theworld
database, she is not the owner of the public
schema: by default the owner of the public
schema is the user who created the table, in this case jkatz.
However, as a superuser, you can run the following code to remove the function:
DROP FUNCTION public.reverse(varchar);
Note that because the superuser owns the schema, if alice attempts to revoke all object creation privileges on public
it will fail, as:
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
yields:
WARNING: no privileges could be revoked for `public`
As a superuser connected to the theworld
database, you can run:
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
and then nobody other than superusers can create objects in the public
schema.
Note that the second PUBLIC
is a special directive that refers to all users.
Life After "REVOKE CREATE ON SCHEMA public FROM PUBLIC
"
After running the above command, users who have access to theworld
will no longer be able to create objects in the public
schema. If as alice you try to run:
CREATE OR REPLACE FUNCTION public.square(int)
RETURNS int
AS $$
SELECT $1 * $1;
$$ LANGUAGE SQL IMMUTABLE;
you will receive the following error:
ERROR: permission denied for schema public
If alice tries to updated the public.cube
function:
CREATE OR REPLACE FUNCTION public.cube(int)
RETURNS int
AS $$
SELECT $1 * ($1 * $1);
$$ LANGUAGE SQL IMMUTABLE;
even though she is the function owner, she will receive the following error:
ERROR: permission denied for schema public
However, alice can still drop the public.cube
function:
DROP FUNCTION public.cube(int);
Conclusion
Schemas are very powerful and can help you with a variety of purposes in your application. It is a good idea to play around with PostgreSQL schema functionality in your test environment as if you were deploying that environment to a production system.
If you have a multi-user application that requires strong security practices, we encourage you to take a look at the PostgreSQL Secure Technical Implementation Guide (STIG) as well as the automated STIG compliance tools.
Related Articles
- Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
8 min read
- Loading the World! OpenStreetMap Import In Under 4 Hours
6 min read
- Easy Totals and Subtotals in Postgres with Rollup and Cube
5 min read
- A change to ResultRelInfo - A Near Miss with Postgres 17.1
8 min read
- Accessing Large Language Models from PostgreSQL
5 min read