Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Elizabeth Christensen
Elizabeth Christensen
We recently gave a talk at SCaLE (Southern California Linux Expo) about common problems and solutions for managing large Postgres databases. One of the topics we covered was data skewing and partial indexing. This piqued some conference discussion afterwards so we wanted to do a deeper dive.
Skewed data is when your data is kind of bunched up - essentially it is not evenly distributed. You might have one really large customer with a customer id that takes up more than half the rows in your events table. Or a default value that gets created and many of the values in a certain column represent defaults. If you graphed table data, skewed data just means that data would not appear in a symmetrical distribution, it would be unevenly distributed.
Under the hood, Postgres knows what kind of data you have in your database and uses that information to create query plans and when to use indexes. In some cases, skewed data will result in a situation where Postgres is not using an index - thus making some queries less efficient.
As a general rule, Postgres generally doesn't use an index if a single value is greater than 30% of the total data. So skewed data can nullify an index in cases where you’re using a single or multi-column index and one of your columns has skewed data.
Paul Ramsey
Paul Ramsey
An extremely common problem in fast-moving data architectures is providing a way to feed ad hoc user data into an existing analytical data system.
Do you have time to whip up a web app? No! You have a database to feed, and events are spiraling out of control... what to do?
How about a Google Sheet? The data layout is obvious, you can even enforce things like data types and required columns using locking and protecting, and unlike an Excel or LibreOffice document, it's always online, so you can hook the data into your system directly.
David Steele
David Steele
Crunchy Data is proud to support the pgBackRest project, an essential production grade backup tool used in our fully managed and self managed Postgres products. pgBackRest is also available as an open source project.
Bob Pacheco
Bob Pacheco
Continuous Integration / Continuous Delivery (CI/CD) is an automated approach in which incremental code changes are made, built, tested and delivered. Organizations want to get their software solutions to market as quickly as possible without sacrificing quality or stability. While CI/CD is often associated with application code, it can also be beneficial for managing changes to PostgreSQL database clusters.
GitOps plays an important part in enabling CI/CD. If you are unfamiliar with GitOps, I recommend starting with my previous post on Postgres GitOps with Argo and Kubernetes
Christopher Winslett
Christopher Winslett
We have been talking a lot here about using Postgres for metrics, dashboards, and analytics. One of my favorite Postgres tools that makes a lot of this work easy and efficient is Hyperloglog
Roberto Mello
Roberto Mello
Postgres 16 is hot off the press with the beta release last week. I am really excited about the new feature that allows logical replication from standbys, allowing users to:
Martin Davis
Martin Davis
PostGIS excels at storing, manipulating and analyzing geospatial data. At some point it's usually desired to convert raw spatial data into a two-dimensional representation to utilize the integrative capabilities of the human visual cortex. In other words, to see things on a map.
PostGIS is a popular backend for mapping technology, so there are many options to choose from to create maps. Data can be rendered to a raster image using a web map server like GeoServer
Paul Ramsey
Paul Ramsey
In a previous life, I worked on a CRM system that really loved the idea of tags. Everything could be tagged, users could create new tags, tags were a key organizing principle of searching and filtering.
The trouble was, modeled traditionally, tags can really make for some ugly tables and equally ugly queries. Fortunately, and as usual, Postgres has an answer.
Today I’m going to walk through working with tags in Postgres with a sample database of 🐈 cats and their attributes
Greg Sabino Mullane
Greg Sabino Mullane
A question I hear from time to time with Crunchy Data clients and the Postgres community is:
When was my Postgres database table created?
Postgres does not store the creation date of tables, or any other database object. But fear not, there are a plethora of direct and indirect ways to find out when your table creation happened. Let's go through some ways to do this, ranging from easy to somewhat hard. All these solutions apply to indexes and other database objects, but tables are by far the most common request.
Craig Kerstiens
Craig Kerstiens
There's a lot of excitement around AI, and even more discussion than excitement. The question of Postgres and AI isn't a single question, there are a ton of paths you can take under that heading...