Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Citus for Postgres on Any Cloud: Announcing Citus Support for Crunchy Bridge

Avatar for Craig Kerstiens

Craig Kerstiens

3 min read

I'm excited to announce support for the Citus extension for Postgres on Crunchy Bridge. This means you can have a fully managed Citus experience on any cloud (AWS, Azure, or GCP) managed by the Postgres experts at Crunchy Data. If you're unfamiliar with Citus it's an extension to PostgreSQL that turns Postgres into a distributed/sharded, horizontally scalable database. Citus excels with multi-tenant workloads, real time analytics use cases, and handling time series data. While the core of Citus is focused on sharding and rebalancing data, it also includes support for columnar compression which can provide over 10x compression of data, extremely useful for high volume data sets and analytical workloads.

Let's jump right into setting up a Citus Cluster

You can get started with your Crunchy Bridge account by first creating a Cluster Group. This ensures your Postgres clusters are in the same network and able to communicate to each other. You can create your Cluster Group by navigating to your ⚙️ settings for a Crunchy Bridge Team.

cluster groups

Once you’ve established a cluster group, you will create 3 clusters in this network group. You will use our standard provisioning workflow, selecting Hobby, Standard, or Memory tier machines with the memory and disk size to suit your needs. If you’re just testing, hobby clusters can be easily suspended when not in use to avoid billing.

coordinator and workers

Installing Citus

To install Citus you will want to connect to each of your nodes and enable the Citus extension and grant local networking permissions. Run this on all 3 nodes:

CREATE EXTENSION citus;
GRANT local_network_trust TO postgres;

Now that you’ve installed Citus you can now inform your coordinator about its worker nodes. Connect to the coordinator then for each worker node you want to connect run a select statement adding the node with the hostname of the worker you’re attaching. Here’s a sample:

select * from citus_add_node('p.higxrmttmbbzjkmfg3rkbgqe7q.db.postgresbridge.com', 5432)

Basic Sharding Architecture

The Citus architecture will have a coordinator and two or more workers. By default a cluster will have 32 shards, so 16 shards living on each worker. If you need to adjust your default shard count, you can change that through the SET citus.shard_count TO n;. Typically you’ll want at least 4 times the number of shards to nodes when starting out.

coordinator and workers

Picking your sharding key

Splitting your data up to where it resides in smaller chunks, spread across distinct separate buckets will be your next challenge. A sharding key can be a table, a schema, or a different physical database. If you’re multi-tenant, sharding by customer or organization is super common. If you’re building an IOT application then device_id can be a great shard key, you can then layer on partitioning with pg_partman to help prune older data and better manage time series data (don’t forget about the columnar support which can help at compressing old data).

When building your application we do recommend leveraging a library to help with the multi-tenant enforcement. This could be the apartment gem for rails if using schemas, acts_as_tenant for rails if using a column identifier, django-multitenant if using Django, or in Java hibernate has support for a tenant discriminator.

If you need any help in designing and choosing a shard key please don’t hesitate to reach out.

Go forth and distribute

Citus is one of the most advanced Postgres extensions in existence. While it is not ideal for every use case, when you do need Citus it takes what was a challenging scaling problem and makes it magically easier. We’re excited to make Citus more available with a great experience on the cloud of your choice, no need to self manage or be locked-in.