Citus: The Misunderstood Postgres Extension
9 min readMore by this author
Citus is in a small class of the most advanced Postgres extensions that exist. While there are many Postgres extensions out there, few have as many hooks into Postgres or change the storage and query behavior in such a dramatic way. Most that come to Citus have very wrong assumptions. Citus turns Postgres into a sharded, distributed, horizontally scalable database (that's a mouthful), but it does so for very specific purposes.
Citus, in general, is fit for these type of applications and only these type:
- Sharding a multitenant application: a SaaS/B2B style app, where data is never joined between customers
- Low user facing, high data volume analytics: specifically where the dashboards are hand-curated with minimal levers-and-knobs for the user to change (i.e. customer cannot generate unknown queries)
Mistaken use cases for Citus that are not a great fit:
- Lack of rigid control over queries sent to database
- Geographic residency goals or requirements; Citus is distributed for scale, not distributed for edge.
Let's look closer at each of the two use cases that Citus is a good fit for.
Multitenant/SaaS applications
Multitenant or SaaS applications typically follow a pattern: 1) tenant data is siloed and does not intermingle with any other tenant's data, and 2) a "tenant" is a larger entity like a "team" or "organization".
An example of this could be Salesforce. Within Salesforce you have the notion of an organization, and the organization has accounts, customers, and opportunities within them. When you create a Salesforce account, all of your customers and opportunities are solely yours — data is not shared with other Salesforce organizations.
For these types of applications, Citus distributes the data for each tenant into a shard. Citus handles the splitting of data by creating placement groups that know they are grouped together, and placing the data within shards on specific nodes. A physical node may contain multiple shards. Let me restate that to understand Citus at a high-level:
- physical node: the physical container that holds shards
- shard: a logical container for data; resides on a physical node, and can be moved between physical nodes
- placement group: uses a hash-based algorithm to assign a tenant id to a shard
Regarding shards, while possible to split a large shard, it is easier to start with the proper configuration. Getting scaling right in the beginning makes it easier later because moving full shards is easier than splitting them once they already exist, though that is possible.
In a very basic Citus cluster, you might have something that looks like:
Within Citus, multitenant/SaaS applications can work well because sharding is at the core of what Citus does. In the case of a tenant application, the tenant id becomes the shard key. When you shard all the tables on the same key, Citus places each table on the same physical node. Then, queries with joins are executed local to the instance and faster.
Alternatively, poor shard key planning would require joining data across the network. This shuffling of data is detrimental to performance within databases – especially distributed ones. For multitenant/SaaS, leveraging Citus requires the tenant id a column on every table.
While in a more simple design, accounts, customers, and opportunities tables may have only a primary key and a foreign key reference to their parent relationship. In Citus, we need to turn those into composite primary keys that leverage both tenant id and the foreign key. Extending the above diagram, if we were to now create accounts, customers, and opportunities tables as sharded tables with Citus, we'd have something that roughly results to the following:
To speed query performance, include a where condition for the tenant id (below org_id
) in all queries as well — this ensures that Citus knows how to push down the join to that single node. A query for open opportunities for a specific tenant might look something like:
SELECT customer.email, customer.first_name, customer.last_name, opportunity.amount, opportunity.notes
FROM opportunity,
customer,
account
WHERE customer.org_id = account.org_id
AND opportunity.org_id = account.org_id
AND opportunity.account_id = customer.account_id
AND account.org_id = 4;
Citus would then quietly re-write this query to target the appropriate sharded tables, and effectively execute the query against only the relevant tables:
Now, there is a bit more to designing multitenant apps to work with Citus. For example, universal data can be placed in reference tables that can be distributed across all nodes, or local tables that can live solely within the coordinator. For the bulk of a Citus multitenant workload, tables will:
- Contain your shard key
- Be indexed using a composite key on shard key + foreign key
- Be distributed based on the shard key / tenant id
- Be queried used the on the shard key / tenant id
Let's shift to the other common use case for Citus: what Citus defines as real-time dashboards or analytics.
Real-time analytics with Citus
Where multitenant leverages the shard separation of Citus, here you're looking to leverage the parallelism of Citus.
Real-time analytics is indeed a bit vague. It is often some kind of event data that is high volume and is presented as a dashboard, report, monitoring, or alerting. Query patterns are often aggregating in some form; while there may be joins, they happen at a lower level then bubble up to a higher level for aggregation.
When operating a small volume of data, you don't necessarily need Citus — plain old Postgres can work just fine. With high data volume, Postgres is not as suited for analytics (unless you're talking Crunchy Data Warehouse, which is optimized for OLAP workloads – see more here).
With the multitenant/SaaS example, we wanted the query to be pushed down to a single node and operate within a single physical node. With real-time analytics, we want the opposite: queries execute across all the nodes using as many cores as available within the cluster.
Let's make this a little more concrete. Start with the idea of a Google Analytics type of event analytics — similar to what is talked about in the Citus docs. Here we may have something like:
CREATE TABLE http_request (
site_id INT,
ingest_time TIMESTAMPTZ DEFAULT now(),
url TEXT,
request_country TEXT,
ip_address TEXT,
status_code INT,
response_time_msec INT
);
Let's jump ahead and look past how we shard the data and to the query itself. The query shows a better idea of how Citus works in these situations. Let's build a query to return how many 404s and 200s from the country "Australia" along with the average response time for each:
SELECT
status_code,
COUNT(id) AS request_count,
AVG(response_time_msec) AS average_response_time_msec
FROM http_request
WHERE request_country = 'Australia'
AND status_code IN (200, 404);
This query will run on every single shard. To process the query as fast as possible, the number of shards should match the number of cores available. If you end up with something like 16 shards in a single node, you'd want ideally 16 cores or even more (to handle additional concurrency). The query will be executed as smaller composable building blocks.
Citus processing the count of 404s and 200s is easy. It runs the query as a count on the nodes, then the coordinator calculates the sum of counts. We simply get: the sum of count(id) where country = "Australia" and the appropriate status code.
But! To calculate the average response time we need to get the count from each shard as well as the sum of the response_time_msec
values. From there, Citus recombines all those back on the coordinator. Citus has each shard sending 4 values back (versus all the raw data), and doing the final math on the coordinator.
This results in fast aggregations across large datasets. But if you haven't thought ahead yet, this only works for very specific queries. Counts and averages are great. If you're looking to do something like median, that gets a little harder. You need the full data set to compute a perfect median. (For now we're setting aside there are probabilistic approaches to getting approximate results that work quite well. Algorithms like t-digest or KLL can work if you're okay with approximate or inexact answers).
The other big piece of this is your queries need to be able to be constructed for Citus to push down any joins as locally as possible. While our example in this case is a very basic one, most applications still have data they need to join. This can work on Citus, but you still need to apply some of the thought in making joins to be as low level as possible — similar to the multitenant app.
Within the "real-time analytics" model you need the following to work in order to be successful:
- Ability to push joins down vs. joins that move data between nodes
- Heavy aggregation or roll-up workload
- Control over crafting the queries that are created
Concurrency and connections
The one "gotcha" of the real-time analytics use case is concurrency. In our simple example of querying http_request
, it's great if you only have 4 shards. But in a world of 64 shards spread across 4 nodes–you have 16 nodes per shard. This means a single query to Postgres could open 16 connections to each node. One weak area of Postgres is connection management and scaling those, so, we recommend and support pgBouncer out of the box across all our products.
Designing up front for Citus
A success factor with Citus will be your use case. If it is a fit, the more greenfield the application, the better your chance. Existing applications can absolutely be retrofitted to work with Citus, but it often takes some data maneuvering, schema modifications, and query modifications. As with many technologies, if Citus is the right tool for you then "Awesome!", you should absolutely use it. For questions if you think Citus may or may not be a fit, reach out to us @crunchydata. We've helped a number of customers successfully adopt Citus in cases. In others, we've helped our customers be successful on different paths. While Citus is very powerful, it is a special purpose tool.
Related Articles
- Citus: The Misunderstood Postgres Extension
9 min read
- Pi Day PostGIS Circles
2 min read
- Postgres, dbt, and Iceberg: Scalable Data Transformation
7 min read
- Validating Data Types from Semi-Structured Data Loads in Postgres with pg_input_is_valid
4 min read
- Incremental Archival from Postgres to Parquet for Analytics
7 min read