Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Latest posts from Marco Slot

  • 11 min read

    pg_incremental: Incremental Data Processing in Postgres

    Marco Slot

    Today I’m excited to introduce pg_incremental, a new open source PostgreSQL extension for automated, incremental, reliable batch processing. This extension helps you create processing pipelines for append-only streams of data, such as IoT / time series / event data workloads.

    Notable pg_incremental use cases include:

    • Creation and incremental maintenance of rollups, aggregations, and interval aggregations
    • Incremental data transformations
    • Periodic imports or export of new data using standard SQL
    Read More
  • 8 min read

    Iceberg ahead! Analyzing Shipping Data in Postgres

    Marco Slot

    PostgreSQL is one of the most versatile data storage and processing tools available. We enhanced it even further by adding Iceberg tables to PostgreSQL in Crunchy Data Warehouse with a fast analytical query engine.

    What is Iceberg? Iceberg tables are stored in a compressed columnar format for fast analytics in object storage (S3). This means storage is cheap and there are no storage limits. Yet the tables are still transactional and work with nearly all PostgreSQL features. Crunchy Data Warehouse can also query or load raw data from object storage into Iceberg tables via PostgreSQL commands.

    A pattern we repeatedly see in data analytics scenarios is:

    • Use temporary or external tables to collect raw data
    • Use Iceberg as a central repository to organize data
    • Use PostgreSQL tables or materialized views for querying insights
    Read More
  • 8 min read

    Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics

    Marco Slot

    PostgreSQL is the bedrock on which many of today’s organizations are built. The versatility, reliability, performance, and extensibility of PostgreSQL make it the perfect tool for a large variety of operational workloads.

    The one area in which PostgreSQL has historically been lacking is analytics, which involves queries that summarize, filter, or transform large amounts of data. Modern analytical databases are designed to query data in data lakes in formats like Parquet

    Read More
  • PostGIS meets DuckDB: Crunchy Bridge for Analytics goes Spatial

    Marco Slot

    Crunchy Data is excited to announce the next major feature release for Crunchy Bridge for Analytics: Geospatial Analytics.

    We have developed a variety of features to connect Postgres and PostGIS to S3 and public web servers to make spatial data access easier than ever.

    This release includes:

    • Creating an analytics table directly from a geospatial data set by providing only the URL, for ad-hoc queries and data transformations.
    • Creating a regular PostGIS table directly from a URL.
    • Automatic mapping of geospatial columns into PostGIS geometry type.
    • Support for GeoParquet, GeoJSON, Shapefile (zip), Geopackage, WKT in CSV, and more.
    • Delegate PostGIS functions and operators to DuckDB spatial for fast queries on GeoParquet.

    Together, these make Crunchy Bridge for Analytics an easy-to-use and powerful platform for working with geospatial data.

    Read More
  • 4 min read

    Postgres Materialized Views from Parquet in S3 with Zero ETL

    Marco Slot

    Data pipelines for IoT applications often involve multiple different systems. First, raw data is gathered in object storage, then several transformations happen in analytics systems, and finally results are written into transactional databases to be accessed by low latency dashboards. While a lot of interesting engineering goes into these systems, things are much simpler if you can do everything in Postgres.

    Crunchy Bridge for Analytics

    Read More
  • 12 min read

    Postgres Powered by DuckDB: The Modern Data Stack in a Box

    Marco Slot

    Postgres for analytics has always been a huge question mark. By using PostgreSQL's extension APIs, integrating DuckDB as a query engine for state-of-the-art analytics performance without forking either project could Postgres be the analytics database too?

    Bringing an analytical query engine into a transactional database system raises many interesting possibilities and questions. In this blog post I want to reflect on what makes these workloads and system architectures so different and what bringing them together means.

    OLAP & OLTP: Never the twain shall meet

    Read More
  • 11 min read

    Crunchy Bridge Adds Iceberg to Postgres & Powerful Analytics Features

    Marco Slot

    In April we launched Crunchy Bridge for Analytics, which is a managed PostgreSQL option that enables fast and seamless querying of your data lake. Our initial release was focused on building a rock solid foundation for high performance analytics in PostgreSQL. We have since been hard at work turning it into a comprehensive analytics solution.

    Our goals in building Crunchy Bridge for Analytics are to:

    • Make it very easy to query data files (incl. Parquet/CSV/JSON/Iceberg) in object stores like S3 from PostgreSQL, as well as easy data import/export.
    • Offer best-in-class analytics performance, for example by integrating DuckDB into PostgreSQL
    Read More
  • 6 min read

    How We Fused DuckDB into Postgres with Crunchy Bridge for Analytics

    Marco Slot

    Last month we launched Crunchy Bridge for Analytics, a new managed PostgreSQL offering that lets you query your data lake directly from PostgreSQL. Since then, we have had quite a few exciting conversations with customers handling large amounts of data in PostgreSQL. A common question is of course: How does it work?

    In this post, I wanted to shed some light on the internals. Crunchy Bridge for Analytics abstracts the query engine to offer fast analytics on data in Amazon S3 in PostgreSQL. In principle, it can support multiple query engines, and it likely will in the future, but the current query engine is DuckDB.

    A bit of history: Distributed SQL pushdown in Citus

    Read More
  • Syncing Postgres Partitions to Your Data Lake in Crunchy Bridge for Analytics

    Marco Slot

    One of the unique characteristics of the recently launched Crunchy Bridge for Analytics is that it is effectively a hybrid between a transactional and an analytical database system. That is a powerful tool when dealing with data-intensive applications which may for example require a combination of low latency, high throughput insertion, efficient lookup of recent data, and fast interactive analytics over historical data.

    A common source of large data volumes is append-mostly time series data or event data generated by an application. PostgreSQL has various tools to optimize your database for time series, such as partitioning

    Read More
  • Crunchy Bridge for Analytics: Your Data Lake in PostgreSQL

    Marco Slot

    A lot of the world’s data lives in data lakes, huge collections of data files in object stores like Amazon S3. There are many tools for querying data lakes, but none are as versatile and have as wide an ecosystem as PostgreSQL. So, what if you could use PostgreSQL to easily query your data lake with state-of-the-art analytics performance?

    Today we’re announcing Crunchy Bridge for Analytics

    Read More