Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Demystifying Schemas & search_path through Examples

Avatar for Jonathan S. Katz

Jonathan S. Katz

16 min read

On March 1, 2018, the PostgreSQL community released version 10.3 and other supported versions of PostgreSQL.  The release centered around a disclosed security vulnerability designated CVE-2018-1058, which is related to how a user can accidentally or maliciously "create like-named objects in different schemas that can change the behavior of other users' queries."

The PostgreSQL community released a guide around what exactly CVE-2018-1058 is and how to protect your databases. However, we thought it would also be helpful to look into what schemas are in PostgreSQL, how they are used under normal operations, and how to investigate your schemas to look for and eliminate suspicious functions.

The examples below are all via the command-line. You will need to have at least two terminal windows available to make connections to your database. Certain operations, such as inspecting what schemas and functions are in your database, are much easier to perform with a graphical tool like pgAdmin4.

Setup

In this example, there are three user accounts:

  • A database superuser. In my database, the superuser is named “jkatz” and already exists
  • alice
  • bob

You can create alice and bob using the “createuser” command which comes with a PostgreSQL installation:

createuser alice
createuser bob

If you create alice and bob some other way, please ensure that they do not have any special status, i.e. they:

  • Cannot create other databases
  • Cannot create other users
  • Are not superusers themselves

You may opt to set a password, but the examples below do not require one.

Also in the examples below, we will be distinguishing between who is executing the queries.

  • Queries highlighted in blue are executed by alice
  • Queries highlighted in orange are executed by bob
  • Queries highlighted in red are executed by the superuser. In the examples below, the superuser is named "jkatz"

If you ever get lost as to which user is logged into which connection, you can run the following code to determine who you are currently logged in as:

SELECT SESSION_USER;

We will be using PostgreSQL 10 in this example. If you are on an older version of PostgreSQL, if you see a phrase like:

id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY

you can replace it with:

id serial PRIMARY KEY

Finally, to complete the setup, create a new database called theworld that is owned by alice:

createdb -O alice theworld

In two separate terminal windows, connect to theworld as alice and bob respectively:

psql -U alice theworld
psql -U bob theworld

What are Schemas? Before defining what a schema is, we need to define what an object is in PostgreSQL. PostgreSQL is more than just a relational database: it was designed to be an object-relational database, and within PostgreSQL there exist many kinds of objects:

  • Data types
  • Tables
  • Functions
  • Operators
  • Views
  • Triggers

and more.

Schemas allow users to namespace and group objects together within a database. For instance, with schemas you can do the following:

  • Group specific parts of your application together, e.g. accounts in one part, payments in another
  • Keep 3rd-party applications or foreign data tables separate from your primary application to avoid naming conflicts
  • Isolate users to specific schemas so their work does not conflict with each other

By default, a schema named public is created in every new PostgreSQL database. To see this, as alice run the following command, which lists all schemas in a database:

\dn

You should see output like this:

List of schemas

 Name   | Owner
--------+-------
 public | jkatz

Also by default in PostgreSQL, any user who can connect to a database can create new objects to the public schema. This will be explored in depth in the following examples.

Exploring the “public” schema

In many applications, you are working with the public schema implicitly without realizing it. For instance, as alice, try creating a table:

CREATE TABLE vegetables (
    id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    vegetable_name varchar(255) NOT NULL UNIQUE
);

Now get a list of all tables in the database by using the following command:

\dt

which should return:

List of relations

 Schema | Name       | Type  | Owner
--------+------------+-------+-------
 public | vegetables | table | alice

The result: alice created a table named vegetables in the public schema of theworld database. We could have also executed:

CREATE TABLE public.vegetables (
    id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    vegetable_name varchar(255) NOT NULL UNIQUE
);

and generated the same result, but we did not have to. Why?

The search path

PostgreSQL knows what schema it is currently in based on the schema search path. When PostgreSQL is given an unqualified object name (more on that in a second), it uses the search path to determine what order it should search the schemas to find objects. You can use the command SHOW search_path; to view your current search path, which if you run that as alice you will see:

 search_path
-----------------
 "$user", public

which is the default. "$user" is a special option that says if there is a schema that matches the current user (vis-à-vis SELECT SESSION_USER;), then search within that schema. If there are entries in the search path for nonexistent schemas, then they are ignored.

You can set the "search_path" per-transaction, per-session, or as a database configuration setting.

Using the default settings, if you execute this query as alice:

INSERT INTO vegetables (vegetable_name) VALUES ('asparagus'), ('broccoli'), ('eggplant');

PostgreSQL searches each schema to see where a table called vegetables exists. As there is no schema named alice, PostgreSQL then searches the public schema and performs the INSERT.

Quick Digression: Qualified Object Names

A qualified object name is considered to be of the format schema.table, or in the case of the above example:

public.vegetables

You can also write a qualified object name in the database.schema.table format, or in the case of the above example:

theworld.public.vegetables

This might lead you to think: does PostgreSQL support cross-database queries outside of the box? The short answer is: no. Try it out!  Open a new terminal window and run the following:

createdb -O alice newworld

psql -U alice newworld -c "SELECT * FROM theworld.public.vegetables;"

ERROR: cross-database references are not implemented: "theworld.public.vegetables"

*What about "bob?" Multiple users and the public schema

Recall that bob also has access to the database theworld and decides that he wants to see all of the tables in the database. As bob run:

\dt

and you should see:

List of relations

 Schema | Name       | Type  | Owner

--------+------------+-------+-------

 public | vegetables | table | alice

So bob is able to see that alice owns a table called vegetables within the public schema. As bob try looking at the contents inside of vegetables:

SELECT * FROM vegetables;

You should receive this error:

ERROR: permission denied for relation vegetables

Even though bob has access to the public schema, it does not mean he can interact with every object that has been created in it. In this case, the owner of the vegetables table, alice, did not GRANT any permissions to bob and thus he cannot view the content.

bob can create his own objects within the public schema. As bob run the following query:

CREATE TABLE fruits (
    id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    fruit_name varchar(255) NOT NULL UNIQUE
);

First, as bob, look at the updated list of tables:

\dt

List of relations

 Schema | Name       | Type  | Owner
--------+------------+-------+-------
 public | fruits     | table | bob
 public | vegetables | table | alice

As bob, try inserting and retrieving some last names:

INSERT INTO fruits (fruit_name) VALUES ('apple'), ('banana'), ('pear');
SELECT * FROM fruits;

 id | fruit_name
----+------------
 1  | apple
 2  | banana
 3  | pear

What does alice see? As alice, first look at what tables are available:

\dt

List of relations

 Schema | Name       | Type  | Owner
--------+------------+-------+-------
 public | fruits     | table | bob
 public | vegetables | table | alice

Now as alice, try to retrieve a list of fruits from bob's table:

SELECT * FROM fruits;

ERROR: permission denied for relation fruits

bob did not give permission to alice to view his table, and thus alice cannot run a SELECT query on his table.

Functions and the public schema

PostgreSQL allows users to create functions, so it's a good idea to see how these work within the public schema. As alice, let's create a simple function that cubes integers:

CREATE OR REPLACE FUNCTION cube(int)
RETURNS int
AS $$
    SELECT $1 * $1 * $1;
$$ LANGUAGE SQL IMMUTABLE;

Because our search path has not been updated, this function is created in the public schema. You can view all the functions in the theworld database with the following command:

\df

As alice, if you run the following query:

SELECT cube(3);

you will see the following result:

 cube
------
 27

Now, as bob run the same query:

SELECT cube(3);

You will see the following result:

 cube
------
 27

By default, when a function is created, the function is executable by all users that have "USAGE" permission in that schema. However, this does not necessarily mean that the functions will execute properly due to the underlying permissions of objects being referenced in said functions. For example, alice decides to create a set returning function that will return a list that reverse all the names in the vegetables table:

CREATE OR REPLACE FUNCTION reverse_all()
RETURNS TABLE(vegetable_name text)
AS $$
    SELECT reverse(vegetable_name)
    FROM vegetables;
$$ LANGUAGE SQL STABLE;

When alice executes:

SELECT * FROM reverse_all();

She will see:

 vegetable_name
----------------
 sugarapsa
 iloccorb
 tnalpgge

When bob tries to execute the same function:

SELECT * FROM reverse_all();

He will see:

ERROR: permission denied for relation vegetables

Even though bob can execute reverse_all, unless he has permission to interact with the underlying table in the function, he cannot view or manipulate the results.

You will also notice that the reverse_all function contains a function named reverse but it does not have a qualified object-name. reverse and other built-in PostgreSQL functions exist in the pg_catalog schema. By default, PostgreSQL prepends the pg_catalog schema implicitly to the search_path and looks for all objects in there first. To see a list of functions that are in the pg_catalog, you can use the following command:

\df pg_catalog.*

Creating & Managing Schemas

Creating a $user Schema

alice decides she wants to create her own schema:

CREATE SCHEMA alice;

The default search_path rule means that all new objects with unqualified names that alice creates will now be created in the alice schema.

As alice, let's create a table called fruits:

CREATE TABLE fruits (
    id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    fruit_name varchar(255) NOT NULL UNIQUE
);

Now as alice, retrieve a list of tables:

\dt
List of relations

 Schema | Name       | Type  | Owner
--------+------------+-------+-------
 alice  | fruits     | table | alice
 public | vegetables | table | alice

You will notice that the table named fruits in public is no longer showing. Based on the search path, because the alice schema takes precedence, the similarly-named fruits table in the alice schema will appear.

If you want to see all of the tables in the public schema, you can run:

\dt public.*

Recall earlier that when alice ran SELECT * FROM fruits; she received an error indicating that she did not have access to the fruits table. Now, when you run the below as alice:

INSERT INTO fruits (fruit_name) VALUES ('grape'), ('kiwi'), ('orange');
SELECT * FROM fruits;

You will see

 id | fruit_name
----+------------
  1 | grape
  2 | kiwi
  3 | orange

as based on alice's search path, alice.fruits is found before public.fruits

How does this affect bob?

As bob first get a list of schemas:

\dn

which returns something similar to:

List of schemas

 Name   | Owner
--------+-------
 alice  | alice
 public | jkatz

So bob can see that alice has her own schema now. By default, bob can also see what tables alice has in her schema:

\dt alice.*

which returns:

List of relations

 Schema | Name   | Type  | Owner
--------+--------+-------+-------
 alice  | fruits | table | alice

Also by default, bob can also inspect the fruits table structure:

\d alice.fruits

which returns:

Table alice.fruits

 Column     | Type                   | Collation | Nullable | Default
------------+------------------------+-----------+----------+----------------------------------
 id         | integer                |           | not null | generated by default as identity
 fruit_name | character varying(255) |           | not null |

Indexes:

"fruits_pkey" PRIMARY KEY, btree (id)

When bob tries to execute the below query:

SELECT * FROM fruits;

he will still see the results from the public.fruits table:

 id | fruit_name
----+------------
  1 | apple
  2 | banana
  3 | pear

Schemas Outside of the Search Path

alice decides that she wants to create a new schema called animals:

CREATE SCHEMA animals;

alice also decides she does not want to add this schema to the search path.

alice then creates a table in animals called fish and populates it like so:

CREATE TABLE animals.fish (
    id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    fish_name varchar(255)
);
INSERT INTO animals.fish (fish_name) VALUES ('trout'), ('grouper'), ('salmon');

If alice tries to run the following query:

SELECT * FROM fish;

She will receive the following error:

ERROR: relation `fish` does not exist

As animals.fish is outside of the search path.

Instead, alice needs to run the query with the qualified object name:

SELECT * FROM animals.fish;

which produces:

 id | fish_name
----+-----------
  1 | trout
  2 | grouper
  3 | salmon

If bob tries to run the above query:

SELECT * FROM animals.fish;

He will see this error:

ERROR: permission denied for schema animals

alice decides she wants to share the tables of the animals schema with bob, so she grants him access to read data from all the tables in the schema:

GRANT SELECT ON ALL TABLES IN SCHEMA animals TO bob;

However, when bob tries to run the above query:

SELECT * FROM animals.fish;

He will see this error:

ERROR: permission denied for schema animals

That is because alice needs to give bob the schema USAGE permission on animals, which she can do with the following command:

GRANT USAGE ON SCHEMA animals TO bob;

bob can then run:

SELECT * FROM animals.fish;

and see:

 id | fish_name
----+-----------
  1 | trout
  2 | grouper
  3 | salmon

Functions in Schemas Outside of the Search Path

Let's have bob create a function called animals.reverse_all that is similar to the function defined earlier in this post. As bob try running:

CREATE FUNCTION animals.reverse_all()
RETURNS TABLE(fish_name text)
AS $$
    SELECT reverse(fish_name) FROM fish;
$$ LANGUAGE SQL STABLE;

You should receive the following error:

ERROR: permission denied for schema animals

While alice granted permission for bob to use the animals schema, she did not give him permission to CREATE objects in the animals schema. As alice execute the following to allow bob to create objects in animals:

GRANT CREATE ON SCHEMA animals TO bob;

Now, when bob tries to create the aforementioned function:

CREATE FUNCTION animals.reverse_all()
RETURNS TABLE(fish_name text)
AS $$
    SELECT reverse(fish_name) FROM fish;
$$ LANGUAGE SQL STABLE;

he receives the following error:

ERROR: relation `fish` does not exist

Recall that the animals is not in the search path. There are two ways bob can address this:

Method #1: SET the search path for the function

We can explicitly set the search path for the function like so:

CREATE FUNCTION animals.reverse_all()
RETURNS TABLE(fish_name text)
AS $$
    SELECT reverse(fish_name) FROM fish;
$$ LANGUAGE SQL STABLE
SET search_path = 'animals';

Method #2: Use the Qualified Object Name

CREATE FUNCTION animals.reverse_all()
RETURNS TABLE(fish_name text)
AS $$
    SELECT reverse(fish_name) FROM animals.fish;
$$ LANGUAGE SQL STABLE;

You can now verify that both bob and alice can execute SELECT * FROM animals.reverse_all();

Explaining CVE-2018-1058: What it is, HOW TO AUDIT & PROTECT YOUR SCHEMA

Recall that in our current schema setup, bob is able to see that alice has a table named fruits in her schema, but is not able to see the contents of alice.fruits. However, bob becomes a little too curious, and decides to create a function to try and capture what alice has in alice.fruits.

bob creates a function in public called reverse that looks just like the pg_catalog.reverse function: it will accept a string of type varchar and return it in reverse order. However, he adds a couple of lines of code to secretly capture the value into his own table.

GRANT SELECT, INSERT ON public.fruits TO alice;

CREATE OR REPLACE FUNCTION public.reverse(varchar)
RETURNS text
AS $$
    WITH pwn AS (
        INSERT INTO public.fruits (fruit_name) VALUES ($1)
        ON CONFLICT DO NOTHING
        RETURNING fruit_name
    )
    SELECT pg_catalog.reverse($1)
$$ LANGUAGE SQL;

If alice calls the reverse without using name qualification (i.e. pg_catalog.reverse), this will give bob his desired results because:

  1. bob made sure that if alice executes this query, it will not fail, and it will returns the result she expects.
  2. bob gamed the search path: even though the pg_catalog will be searched for matching objects before all other schemas, bob realized that if he made a function that matches slightly better than the pg_catalog version, then he would be able to get this function to execute.

The next time alice executes:

SELECT reverse(fruit_name) FROM fruits;

she will see:

 reverse
---------
 eparg
 iwik
 egnaro

which is the expected output.  However, she will have executed public.reverse instead of pg_catalog.reverse.

Meanwhile, the next time bob runs the below query after alice calls the reverse function:

SELECT * FROM fruits;

he will see:

 id | fruit_name
----+------------
  1 | apple
  2 | banana
  3 | pear
  4 | grape
  5 | kiwi
  6 | orange

and thus will have successfully caused alice to leak data to him.

Protecting the Search Path

alice decides she wants to investigate what is going on in her database, especially after reading through the CVE-2018-1058 guide. First, as alice let's see what functions are in the public schema:

\df public.*

which yields:

 Schema | Name        | Result data type           | Argument data types | Type
--------+-------------+----------------------------+---------------------+--------
 public | cube        | integer                    | integer             | normal
 public | reverse     | text                       | character varying   | normal
 public | reverse_all | TABLE(vegetable_name text) |                     | normal

The reverse function should immediately raise a flag for alice as that appears to match a pg_catalog function. alice can inspect the contents of the public.reverse function with the following command:

\sf public.reverse

which yields:

CREATE OR REPLACE FUNCTION public.reverse(character varying)
RETURNS text
LANGUAGE sql
AS $function$
    WITH pwn AS (
        INSERT INTO public.fruits (fruit_name) VALUES ($1)
        ON CONFLICT DO NOTHING
        RETURNING fruit_name
    )
    SELECT pg_catalog.reverse($1)
$function$

Thus alice can see the trap code that bob has placed. To remove the trap, alice tries to run the following:

DROP FUNCTION public.reverse(varchar);

but receives the following message:

ERROR: must be owner of function public.reverse

Even though alice is the owner of the theworld database, she is not the owner of the public schema: by default the owner of the public schema is the user who created the table, in this case jkatz.

However, as a superuser, you can run the following code to remove the function:

DROP FUNCTION public.reverse(varchar);

Note that because the superuser owns the schema, if alice attempts to revoke all object creation privileges on public it will fail, as:

REVOKE CREATE ON SCHEMA public FROM PUBLIC;

yields:

WARNING: no privileges could be revoked for `public`

As a superuser connected to the theworld database, you can run:

REVOKE CREATE ON SCHEMA public FROM PUBLIC;

and then nobody other than superusers can create objects in the public schema.

Note that the second PUBLIC is a special directive that refers to all users.

Life After "REVOKE CREATE ON SCHEMA public FROM PUBLIC"

After running the above command, users who have access to theworld will no longer be able to create objects in the public schema. If as alice you try to run:

CREATE OR REPLACE FUNCTION public.square(int)
RETURNS int
AS $$
    SELECT $1 * $1;
$$ LANGUAGE SQL IMMUTABLE;

you will receive the following error:

ERROR: permission denied for schema public

If alice tries to updated the public.cube function:

CREATE OR REPLACE FUNCTION public.cube(int)
RETURNS int
AS $$
    SELECT $1 * ($1 * $1);
$$ LANGUAGE SQL IMMUTABLE;

even though she is the function owner, she will receive the following error:

ERROR: permission denied for schema public

However, alice can still drop the public.cube function:

DROP FUNCTION public.cube(int);

Conclusion

Schemas are very powerful and can help you with a variety of purposes in your application. It is a good idea to play around with PostgreSQL schema functionality in your test environment as if you were deploying that environment to a production system.

If you have a multi-user application that requires strong security practices, we encourage you to take a look at the PostgreSQL Secure Technical Implementation Guide (STIG) as well as the automated STIG compliance tools.