Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Posts about Analytics

  • Logical replication from Postgres to Iceberg

    Marco Slot

    Operational and analytical workloads have historically been handled by separate database systems, though they are starting to converge. We built Crunchy Data Warehouse to put PostgreSQL at the frontier of analytics systems, using modern technologies like Iceberg and a hybrid query engine . Combining operational and analytical capabilities is extremely useful, but it is not meant to drive all your workloads into a single system. In most organizations, application developers and analysts work...

    Read More
  • 10 min read

    Creating Histograms with Postgres

    Elizabeth ChristensenChristopher Winslett

    Histograms were first used in a lecture in 1892 by Karl Pearson — the godfather of mathematical statistics. With how many data presentation tools we have today, it’s hard to think that representing data as a graphic was classified as “innovation”, but it was. They are a graphic presentation of the distribution and frequency of data. If you haven’t seen one recently, or don’t know the word histogram off the top of your head - it is a bar chart, each bar represents the count of data with a defined...

    Read More
  • Crunchy Data Warehouse: Postgres with Iceberg Available for Kubernetes and On-premises

    Craig Kerstiens

    Today I'm excited to announce the release of Crunchy Data Warehouse on premises, which provides one of the easiest and yet richest ways to work with your data lake in the environment of your choosing. Built on top of Crunchy Postgres for Kubernetes, Crunchy Data Warehouse extends Postgres with a modern data warehouse solution, giving you: • The ability to easily query data where it resides in S3 or S3 compatible storage (like MinIO). With a variety of data formats supported including CSV, JSO...

    Read More
  • 5 min read

    Reducing Cloud Spend: Migrating Logs from CloudWatch to Iceberg with Postgres

    Craig Kerstiens

    As a database service provider, we store a number of logs internally to audit and oversee what is happening within our systems. When we started out, the volume of these logs is predictably low, but with scale they grew rapidly. Given the number of databases we run for users on Crunchy Bridge, the volume of these logs has grown to a sizable amount. Until last week, we retained those logs in AWS CloudWatch. Spoiler alert: this is expensive. While we have a number of strategies to drive efficiency...

    Read More
  • 5 min read

    Automatic Iceberg Maintenance Within Postgres

    Önder Kalacı

    Today we're excited to announce built-in maintenance for Iceberg in Crunchy Data Warehouse . This enhancement to Crunchy Data Warehouse brings PostgreSQL-style maintenance directly to Iceberg. The warehouse autovacuum workers continuously optimize Iceberg tables by compacting data and cleaning up expired files. In this post, we'll explore how we handle cleanup, and in the follow-up posts, we'll take a deeper dive into compaction. If you use Postgres, you are probably familiar with tables and ro...

    Read More
  • 9 min read

    Citus: The Misunderstood Postgres Extension

    Craig Kerstiens

    Citus is in a small class of the most advanced Postgres extensions that exist. While there are many Postgres extensions out there, few have as many hooks into Postgres or change the storage and query behavior in such a dramatic way. Most that come to Citus have very wrong assumptions. Citus turns Postgres into a sharded, distributed, horizontally scalable database (that's a mouthful), but it does so for very specific purposes. Citus, in general, is fit for these type of applications and only the...

    Read More
  • 7 min read

    Postgres, dbt, and Iceberg: Scalable Data Transformation

    Aykut Bozkurt

    Seamless integration of dbt with Crunchy Data Warehouse automates data movement between Postgres and Apache Iceberg. dbt’s modular SQL approach, combined with Iceberg’s scalable storage, and Postgres’ query engine means you can build fast, efficient, and reliable analytics—with minimal complexity. Today let’s dig into an example of using dbt with Postgres and Iceberg. The steps will be: 1. Set up Iceberg tables in Crunchy Data Warehouse using real-world real-time data from GitHub events 2. Confi...

    Read More
  • 7 min read

    Incremental Archival from Postgres to Parquet for Analytics

    Marco Slot

    PostgreSQL is commonly used to store event data coming from various kinds of devices. The data often arrives as individual events or small batches, which requires an operational database to capture. Features like time partitioning help optimize the storage layout for time range filtering and efficient deletion of old data. The PostgreSQL feature set gives you a lot of flexibility for handling a variety of IoT scenarios, but there are certain scenarios for it is less suitable, namely: • Long-te...

    Read More
  • Indexing Materialized Views in Postgres

    Elizabeth Christensen

    Materialized views are widely used in Postgres today. Many of us are working with using connected systems through foreign data wrappers, separate analytics systems like data warehouses , and merging data from different locations with Postgres queries. Materialized views let you precompile a query or partial table, for both local and remote data. Materialized views are static and have to be refreshed. One of the things that can be really important for using materialized views efficiently is inde...

    Read More
  • 19 min read

    Postgres Tuning & Performance for Analytics Data

    Karen Jex

    Your database is configured for the needs of your day-to-day OLTP (online transaction processing) application workload, but what if you need to run analytics queries against your application data? How can you do that without compromising the performance of your application? Application data gradually builds up in your database over time, and at some point the business wants to glean insights from it by running analytics queries. Analytics activity, sometimes called OLAP (online analytical proces...

    Read More