Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Marco Slot
Marco Slot
PostgreSQL is the bedrock on which many of today’s organizations are built. The versatility, reliability, performance, and extensibility of PostgreSQL make it the perfect tool for a large variety of operational workloads.
The one area in which PostgreSQL has historically been lacking is analytics, which involves queries that summarize, filter, or transform large amounts of data. Modern analytical databases are designed to query data in data lakes in formats like Parquet
Elizabeth Christensen
Elizabeth Christensen
Postgres is being used more and more for analytical workloads. There’s a few hidden gems I recently ran across that are really handy for doing SQL for data analysis, ROLLUP
and CUBE
. Rollup and cube don’t get a lot of attention, but follow along with me in this post to see how they can save you a few steps and enhance your date binning
Christopher Winslett
Christopher Winslett
It is never immediately obvious how to go from a simple SQL query to a complex one -- especially if it involves intricate calculations. One of the “dangers” of SQL is that you can create an executable query but return the wrong data. For example, it is easy to inflate the value of a calculated field by joining to multiple rows.
Use Crunchy Playground to follow allow with this blog post using a Postgres terminal:
Christopher Winslett
Christopher Winslett
You followed all the best practices, your sales dates are stored in perfect timestamp format …. but now you need to get reports by day, week, quarters, and months. You need to bin, bucket, and roll up sales data in easy to view reports. Do you need a BI tool? Not yet actually. Your Postgres database has hundreds of functions that let you query data analytics by date. By using some good old fashioned SQL - you have powerful analysis and business intelligence with date details on any data set.
In this post, I’ll walk through some of the key functions querying data by date.
Craig Kerstiens
Craig Kerstiens
Today, we’re excited to release pg_parquet - an open source Postgres extension for working with Parquet files. The extension reads and writes parquet files to local disk or to S3 natively from Postgres. With pg_parquet you're able to:
Code is available at: https://github.com/CrunchyData/pg_parquet/.
Read on for more background on why we built pg_parquet or jump below to get a walkthrough of working with it.
Paul Ramsey
Paul Ramsey
The Overture Maps collection of data is enormous, encompassing over 300 million transportation segments, 2.3 billion building footprints, 53 million points of interest, and a rich collection of cartographic features as well. It is a consistent global data set, but it is intimidatingly large -- what can a person do with such a thing?
Building cartographic products is the obvious thing, but what about the less obvious. With an analytical engine like PostgreSQL and Crunchy Bridge for Analytics, what is possible? Well turns out, a lot of things.
Elizabeth Christensen
Elizabeth Christensen
SQL makes sense when it's working on a single row, or even when it's aggregating across multiple rows. But what happens when you want to compare between rows of something you've already calculated? Or make groups of data and query those? Enter window functions.
Window functions tend to confuse people - but they’re a pretty awesome tool in SQL for data analytics. The best part is that you don’t need charts, fancy BI tools or AI to get some actionable and useful data for your stakeholders. Window functions let you quickly:
Marco Slot
Marco Slot
Crunchy Data is excited to announce the next major feature release for Crunchy Bridge for Analytics: Geospatial Analytics.
We have developed a variety of features to connect Postgres and PostGIS to S3 and public web servers to make spatial data access easier than ever.
This release includes:
Together, these make Crunchy Bridge for Analytics an easy-to-use and powerful platform for working with geospatial data.
Marco Slot
Marco Slot
Data pipelines for IoT applications often involve multiple different systems. First, raw data is gathered in object storage, then several transformations happen in analytics systems, and finally results are written into transactional databases to be accessed by low latency dashboards. While a lot of interesting engineering goes into these systems, things are much simpler if you can do everything in Postgres.
Marco Slot
Marco Slot
Postgres for analytics has always been a huge question mark. By using PostgreSQL's extension APIs, integrating DuckDB as a query engine for state-of-the-art analytics performance without forking either project could Postgres be the analytics database too?
Bringing an analytical query engine into a transactional database system raises many interesting possibilities and questions. In this blog post I want to reflect on what makes these workloads and system architectures so different and what bringing them together means.