Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Latest posts from Craig Kerstiens

  • 10 min read

    When Did Postgres Become Cool?

    Craig Kerstiens

    Postgres wasn't always the cool kid. It didn't emerge from millions of dollars of VC funding, it didn't have a team of DevRel evangelists championing it, it simply started and evolved. Postgres just passed the 25 year milestone as an open source project. With the upcoming release of Postgres 16 in coming months, we thought we'd pause for a minute and take a look back at Postgres and how it got to where it is today.

    If for some reason you've been living under a rock and are unfamiliar with Postgres, give a read on our write-up of Why Postgres

    Read More
  • 2 min read

    Introducing Database Insights: Effortless Postgres Management with Crunchy Bridge

    Craig Kerstiens

    Today we're excited to announce several big improvements to Crunchy Bridge to make it easier to work with your database without having to become a DBA. Today we're releasing:

    • Database insights: Real-time indicators to ensure a healthy database
    • Metrics: Monitor your system over time
    • Production check: Ensure you're ready for launch day
    Read More
  • 5 min read

    Announcing Crunchy Postgres for Kubernetes 5.4

    Craig Kerstiens

    We are happy to unveil the newest release of Crunchy Postgres for Kubernetes version 5.4. This update brings an array of features set to improve your experience including:

    • Support for ARM
    • Native vector search via pgvector
    • Comprehensive support for huge pages
    • Native support for Postgres tablespaces
    • Documentation enhancements
    Read More
  • Practical AI with Postgres

    Craig Kerstiens

    There's a lot of excitement around AI, and even more discussion than excitement. The question of Postgres and AI isn't a single question, there are a ton of paths you can take under that heading...

    • Can I use Postgres for building AI related apps? Absolutely
    Read More
  • Building Customer-Facing Real-Time Dashboards with Postgres

    Craig Kerstiens

    Over the past few weeks we've had several customers ask how they should architect their analytics pipeline. Common questions are:

    • Should we use some kind of data warehouse or time series database?
    • Is Postgres suitable for that type of workload?
    • What are the pitfalls that I should worry about before I get started?
    Read More
  • Thinking Fast vs. Slow with Your Data in Postgres

    Craig Kerstiens

    Today, we wanted to address some basic principles for better managing data architecture. Postgres is well regarded as a database for traditional system of record. More recently we've been fielding questions on what else can it do, such as: Can it be good for analytics and metrics? The short answer is "yes". When applications expand outside their standard system of record, they add in new types of data and data stores, which introduces complexity managing multiple types of systems.

    Some common workloads for Postgres are:

    • Primary system of record
    • Metrics / analytics data
    • Log / event data
    Read More
  • Is your Postgres ready for production?

    Craig Kerstiens

    Is your database ready for production?

    You've been building your application for months, you've tested with beta users, you've gotten feedback and iterated. You've gone through your launch checklist, email beta users, publish the blog post, post to hacker news and hope the comments are friendly. But is your database ready for whatever may come on launch day or even 2 months in? Here's a handy checklist to make sure you're not caught flat footed.

    • Backups❓
    • High availability❓
    • Logs properly configured❓
      • Archived and persisted❓
      • Logging slow queries❓
      • Auto explain❓
    Read More
  • Exposing Postgres Performance Secrets

    Craig Kerstiens

    We spend a lot of time at Crunchy Data helping people dig into the performance of their Postgres. If you're setting up a new Postgres database or already running on in production there are a number of very basic steps you can take that will save your tail in the future when it comes to investigating performance. Here is your guide that'll take less than 5 minutes to get in place. Future you will thank you for doing this today.

    1. Use pg_stat_statements to record queries

    Read More
  • Using Postgres FILTER

    Craig Kerstiens

    For developers who have been around SQL long enough, they know that there is often more than one way to get the same result. Today I wanted to look at a few different ways to aggregate and show a total with paid and unpaid status. First, we will use a common table expression (CTE), which is a nice method for organizing subqueries. Second, we use CASE statements within aggregation context to filter out the values we want. Lastly, we use FILTER to clean up the syntax, but effectively do the same as the CASE statement.

    Hands On Tutorial

    We've loaded a sample data set and a hands on tutorial in our browser based Postgres playground.

    Target Output

    The report we're going to work to generate is a monthly report of revenue from an invoices table. We'll want our end report to look something like:

        mnth    | billed  | uncollected | collected
    ------------+---------+-------------+----------
     2023-02-01 | 1498.06 | 1498.06     |       0
     2023-01-01 | 2993.95 | 1483.04     | 1510.91
     2022-12-01 | 1413.17 |  382.84     | 1030.33
     2022-11-01 | 1378.18 |  197.52     | 1180.66
     2022-10-01 | 1342.91 |  185.03     | 1157.88
     2022-09-01 | 1299.90 |   88.01     | 1211.89
     2022-08-01 | 1261.97 |   85.29     | 1176.68
    

    Invoices Table

    First, let’s look at the underlying data. Show the details of the invoices table:

                                                  Table "public.invoices"
           Column       |              Type              | Collation | Nullable |               Default
    --------------------+--------------------------------+-----------+----------+--------------------------------------
     id                 | bigint                         |           | not null | nextval('invoices_id_seq'::regclass)
     account_id         | integer                        |           |          |
     net_total_in_cents | integer                        |           |          |
     invoice_period     | daterange                      |           |          |
     status             | text                           |           |          |
     created_at         | timestamp(6) without time zone |           | not null |
     updated_at         | timestamp(6) without time zone |           | not null |
    Indexes:
        "invoices_pkey" PRIMARY KEY, btree (id)
        "index_invoices_on_account_id" btree (account_id)
    

    We’ve made a few stylistic choices here for this invoices table.

    1. The invoice_period is a date range. The lower value on the range is the start of the period, and the upper value is the end of the period. To To extract the first value of this range, use lower(invoice_period).

    2. We name the net_total_in_cents field because it contains cents instead of dollars, and we store it as an integer instead of a float. This prevents fractional cents.

    Now, let’s look at a few records:

    SELECT * FROM invoices LIMIT 3;
    

    Which returns:

     id | account_id | net_total_in_cents |     invoice_period      | status |         created_at         |         updated_at
    ----+------------+--------------------+-------------------------+--------+----------------------------+----------------------------
      1 |          4 |               1072 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.420197 | 2023-02-01 17:28:37.420197
      2 |          6 |                955 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.422361 | 2023-02-01 17:28:37.422361
      3 |          7 |                322 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.423726 | 2023-02-01 17:28:37.423726
     (3 rows)
    

    CTEs

    Let’s start with an example that uses CTEs (Common Table Expressions) or sometimes referred to as WITH clauses.

    At first, the following can look complex, but just think of it as 3 different queries aggregated into a single query. The billed , collected, and invoiced queries find their respective values for each month, then the final query joins those values together based on the mnth value.

    WITH totals AS (
    	SELECT
    	  lower(i.invoice_period) as mnth,
    	  SUM(i.net_total_in_cents) / 100.0 as net_total
    	FROM invoices i
    	GROUP BY 1
    ), collected AS (
    	SELECT
    	  lower(i.invoice_period) as mnth,
    	  SUM(i.net_total_in_cents) / 100.0 as amount
    	FROM invoices i
    	WHERE status = 'paid'
    	GROUP BY 1
    ), invoiced AS (
    	SELECT
    	  lower(i.invoice_period) AS mnth,
    	  sum(i.net_total_in_cents) / 100.0 AS amount
    	FROM invoices i
    	WHERE status = 'invoiced'
    	GROUP BY 1
    )
    
    SELECT totals.mnth,
           totals.net_total as billed,
           COALESCE(invoiced.amount, 0) as uncollected,
           COALESCE(collected.amount, 0) as collected
    FROM totals
    	LEFT JOIN invoiced ON totals.mnth = invoiced.mnth
    	LEFT JOIN collected on totals.mnth = collected.mnth
    ORDER BY 1 desc;
    

    The output will show one record per month with the total amount, uncollected amount, and collected amount for the invoices. To experiment dissecting this SQL, you can pluck each of the WITH statements and run them individually to see how they respond.

    Case statements for conditional filtering

    Another option here is leveraging the CASE statement for filtering in how we want to aggregate. For this scenario, the query will be significantly shorter than the use of CTEs.

    SELECT LOWER(i.invoice_period) as mnth,
           SUM(i.net_total_in_cents) / 100.0 as billed,
           SUM (CASE WHEN status = 'invoiced' THEN  i.net_total_in_cents END) / 100.00 as uncollected,
           SUM (CASE WHEN status = 'paid' THEN  i.net_total_in_cents END) / 100.00 as collected
    FROM invoices i
    GROUP BY 1
    ORDER BY 1 desc;
    

    Think of CASE as an IF/THEN statement that returns the net_total_in_cents for the SUM if a condition is met. These types of case statements are fairly common in data analysis. But there is another option: FILTER

    Using FILTER

    FILTER is functionally similar to the CASE statement, but makes the SQL a bit more readable:

    SELECT LOWER(i.invoice_period) as mnth,
           SUM (i.net_total_in_cents) / 100.0 as billed,
           SUM (i.net_total_in_cents) FILTER (WHERE status = 'invoiced') / 100.00 as uncollected,
           SUM (i.net_total_in_cents) FILTER (WHERE status = 'paid') / 100.00 as collected
    FROM invoices i
    GROUP BY 1
    ORDER BY 1 desc;
    

    From the standpoint of making your Postgres syntax as easy to read and streamlined, FILTER is a good tool to keep in your pocket!

    Each of these approaches can work just fine, but in this scenario, using FILTER keeps the SQL cleaner. Experts may use any of these approaches, depending on the situation. Generally, our recommendation is: choose the approach that makes your SQL readable.

    Read More
  • Will Postgres Use My Index? Hypothetical Indexing for Postgres

    Craig Kerstiens

    Postgres is a great database with a ton of features including really rich indexing. Postgres itself maintains all sorts of data under the covers about things like cache hits

    Read More