Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Posts about Postgres Tutorials

  • Using Postgres FILTER

    Craig Kerstiens

    For developers who have been around SQL long enough, they know that there is often more than one way to get the same result. Today I wanted to look at a few different ways to aggregate and show a total with paid and unpaid status. First, we will use a common table expression (CTE), which is a nice method for organizing subqueries. Second, we use CASE statements within aggregation context to filter out the values we want. Lastly, we use FILTER to clean up the syntax, but effectively do the same as the CASE statement.

    Hands On Tutorial

    We've loaded a sample data set and a hands on tutorial in our browser based Postgres playground.

    Target Output

    The report we're going to work to generate is a monthly report of revenue from an invoices table. We'll want our end report to look something like:

        mnth    | billed  | uncollected | collected
    ------------+---------+-------------+----------
     2023-02-01 | 1498.06 | 1498.06     |       0
     2023-01-01 | 2993.95 | 1483.04     | 1510.91
     2022-12-01 | 1413.17 |  382.84     | 1030.33
     2022-11-01 | 1378.18 |  197.52     | 1180.66
     2022-10-01 | 1342.91 |  185.03     | 1157.88
     2022-09-01 | 1299.90 |   88.01     | 1211.89
     2022-08-01 | 1261.97 |   85.29     | 1176.68
    

    Invoices Table

    First, let’s look at the underlying data. Show the details of the invoices table:

                                                  Table "public.invoices"
           Column       |              Type              | Collation | Nullable |               Default
    --------------------+--------------------------------+-----------+----------+--------------------------------------
     id                 | bigint                         |           | not null | nextval('invoices_id_seq'::regclass)
     account_id         | integer                        |           |          |
     net_total_in_cents | integer                        |           |          |
     invoice_period     | daterange                      |           |          |
     status             | text                           |           |          |
     created_at         | timestamp(6) without time zone |           | not null |
     updated_at         | timestamp(6) without time zone |           | not null |
    Indexes:
        "invoices_pkey" PRIMARY KEY, btree (id)
        "index_invoices_on_account_id" btree (account_id)
    

    We’ve made a few stylistic choices here for this invoices table.

    1. The invoice_period is a date range. The lower value on the range is the start of the period, and the upper value is the end of the period. To To extract the first value of this range, use lower(invoice_period).

    2. We name the net_total_in_cents field because it contains cents instead of dollars, and we store it as an integer instead of a float. This prevents fractional cents.

    Now, let’s look at a few records:

    SELECT * FROM invoices LIMIT 3;
    

    Which returns:

     id | account_id | net_total_in_cents |     invoice_period      | status |         created_at         |         updated_at
    ----+------------+--------------------+-------------------------+--------+----------------------------+----------------------------
      1 |          4 |               1072 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.420197 | 2023-02-01 17:28:37.420197
      2 |          6 |                955 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.422361 | 2023-02-01 17:28:37.422361
      3 |          7 |                322 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.423726 | 2023-02-01 17:28:37.423726
     (3 rows)
    

    CTEs

    Let’s start with an example that uses CTEs (Common Table Expressions) or sometimes referred to as WITH clauses.

    At first, the following can look complex, but just think of it as 3 different queries aggregated into a single query. The billed , collected, and invoiced queries find their respective values for each month, then the final query joins those values together based on the mnth value.

    WITH totals AS (
    	SELECT
    	  lower(i.invoice_period) as mnth,
    	  SUM(i.net_total_in_cents) / 100.0 as net_total
    	FROM invoices i
    	GROUP BY 1
    ), collected AS (
    	SELECT
    	  lower(i.invoice_period) as mnth,
    	  SUM(i.net_total_in_cents) / 100.0 as amount
    	FROM invoices i
    	WHERE status = 'paid'
    	GROUP BY 1
    ), invoiced AS (
    	SELECT
    	  lower(i.invoice_period) AS mnth,
    	  sum(i.net_total_in_cents) / 100.0 AS amount
    	FROM invoices i
    	WHERE status = 'invoiced'
    	GROUP BY 1
    )
    
    SELECT totals.mnth,
           totals.net_total as billed,
           COALESCE(invoiced.amount, 0) as uncollected,
           COALESCE(collected.amount, 0) as collected
    FROM totals
    	LEFT JOIN invoiced ON totals.mnth = invoiced.mnth
    	LEFT JOIN collected on totals.mnth = collected.mnth
    ORDER BY 1 desc;
    

    The output will show one record per month with the total amount, uncollected amount, and collected amount for the invoices. To experiment dissecting this SQL, you can pluck each of the WITH statements and run them individually to see how they respond.

    Case statements for conditional filtering

    Another option here is leveraging the CASE statement for filtering in how we want to aggregate. For this scenario, the query will be significantly shorter than the use of CTEs.

    SELECT LOWER(i.invoice_period) as mnth,
           SUM(i.net_total_in_cents) / 100.0 as billed,
           SUM (CASE WHEN status = 'invoiced' THEN  i.net_total_in_cents END) / 100.00 as uncollected,
           SUM (CASE WHEN status = 'paid' THEN  i.net_total_in_cents END) / 100.00 as collected
    FROM invoices i
    GROUP BY 1
    ORDER BY 1 desc;
    

    Think of CASE as an IF/THEN statement that returns the net_total_in_cents for the SUM if a condition is met. These types of case statements are fairly common in data analysis. But there is another option: FILTER

    Using FILTER

    FILTER is functionally similar to the CASE statement, but makes the SQL a bit more readable:

    SELECT LOWER(i.invoice_period) as mnth,
           SUM (i.net_total_in_cents) / 100.0 as billed,
           SUM (i.net_total_in_cents) FILTER (WHERE status = 'invoiced') / 100.00 as uncollected,
           SUM (i.net_total_in_cents) FILTER (WHERE status = 'paid') / 100.00 as collected
    FROM invoices i
    GROUP BY 1
    ORDER BY 1 desc;
    

    From the standpoint of making your Postgres syntax as easy to read and streamlined, FILTER is a good tool to keep in your pocket!

    Each of these approaches can work just fine, but in this scenario, using FILTER keeps the SQL cleaner. Experts may use any of these approaches, depending on the situation. Generally, our recommendation is: choose the approach that makes your SQL readable.

    Read More
  • Will Postgres Use My Index? Hypothetical Indexing for Postgres

    Craig Kerstiens

    Postgres is a great database with a ton of features including really rich indexing. Postgres itself maintains all sorts of data under the covers about things like cache hits

    Read More
  • Intro to Postgres Custom Data Types

    Elizabeth Christensen

    Custom data types is one of those many features that makes PostgreSQL flexible for development of a huge variety of business and application use cases. Data types will help you primarily for data integrity, ensuring your data is stored in the database as you want it to be. A common surprise gift of using strict data types is that you can end up reducing your long term data maintenance.

    There’s two main ways to customize data types in Postgres:

    • Create DOMAIN
    Read More
  • Postgres support for JSON is 10 years old!

    Christopher Winslett

    JSON is everywhere, even in your SQL database. Since Postgres added JSON support 10 years ago, we've seen widespread adoption. When JSON support was first released in 2012 the implementation was a fairly flat representation of the JSON data type. It had limited querying and indexing capabilities. Craig

    Read More
  • Postgres Strings to Arrays and Back Again

    Paul Ramsey

    One of my favourite (in an ironic sense) data formats is the "CSV in the CSV", a CSV file in which one or more of the column is itself structured as CSV.

    Putting CSV-formatted columns in your CSV file is a low tech approach to shipping a multi-table relational data structure in a single file. The file can be read by anything that can read CSV (which is everything?) and ships around the related data in a very readable form.

    Station North,"-1,-4,-14,-15,-16,-15,-12,-9,-3,0,1,2"
    Station West,"2,4,5,6,9,10,15,16,13,12,10,9,5,3,1"
    Station East,"5,3,2,4,5,6,9,10,15,16,13,12,10,9,5,4,2,1"
    Station South,"12,18,22,25,29,30,33,31,30,29,28,25,24,23,14"
    
    Read More
  • Percentage Calculations Using Postgres Window Functions

    Paul Ramsey

    Back when I first learned SQL, calculating percentages over a set of individual contributions was an ungainly business:

    • First calculate the denominator of the percentage,
    • Then join that denominator back to the original table to calculate the percentage.

    This requires two passes of the table: once for the denominator and once for the percentage. For BI queries over large tables (that is, for most BI queries) more passes over the table slow performance significantly.

    Also, the SQL was really ugly!

    With modern PostgreSQL, you can calculate complex percentages over different groups in a single pass

    Read More
  • Postgres Insider Terminology

    Paul Ramsey

    Last week Craig Kerstiens published a great introduction to Postgres terminology, covering some of the basics you might run into when just getting started.

    The funny thing about jargon is how quickly we get used to it, and forget we are even using it. It becomes part of the secret handshake, the way we signal to other members of our tribe that we're part of the group.

    When I first started going to Postgres conferences and listening to talks by Postgres core developers I suddenly found myself at sea. What were these strange words and phrases they were using?

    It turns out that a lot of them are taken from the Postgres code base, which in turn naturally uses them because they are part of Codd's relational model

    Read More
  • Data Loading in Postgres for Newbies

    Elizabeth Christensen

    So you’re new to Postgres and you want to test it out and see how it works. Or maybe you’re building a new app and you need to seed your database. How do you get data in your Postgres database? There’s a lot out there on the topic of data migration for Postgres and some of it can be complicated and overwhelming. I’m going to give you some quick and simple ways to get test data IN

    Read More
  • Postgres Databases and Schemas

    Craig Kerstiens

    In my career I've worked on teams that ran and managed over a million databases, and I've helped shard and scale database for customers at 100s of terabytes.

    Postgres is an incredibly flexible database, and because of it's flexibility you have a lot of options for architecting your data design for your application needs. In talking with current customers

    Read More
  • A Look at Postgres 15: MERGE Command with Examples

    Jean-Paul Argudo

    With PostgreSQL 15 comes a new SQL Command called MERGE. MERGE has been in the SQL standard for quite a while, but it just made its way into the PostgreSQL codebase. Prior to MERGE, if you wanted to refresh a target table

    Read More