Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Posts about Fun with SQL

Fun with SQL
3 min read
Paul RamseyAug 24, 2021
Fast, Flexible Summaries with Aggregate Filters and Windows
Paul RamseyAug 24, 2021
PostgreSQL can provide high performance summaries over multi-million record tables, and supports some great SQL sugar to make it concise and readable, in particular aggregate filtering, a feature unique to PostgreSQL and SQLite . A huge amount of reporting is about generating percentages: for a particular condition, what is a value relative to a baseline. Here's a quick "sales table" with three categories ("a" and "b" and "c") and one million random values between 0 and 10: In the bad-old-days,...
Read More
Fun with SQL
10 min read
Greg Sabino MullaneJun 17, 2021
Fun with pg_checksums
Greg Sabino MullaneJun 17, 2021
Data checksums are a great feature in PostgreSQL. They are used to detect any corruption of the data that Postgres stores on disk. Every system we develop at Crunchy Data has this feature enabled by default. It's not only Postgres itself that can make use of these checksums. Some third party tools such as the awesome pgBackRest program can also use them for enhanced data integrity. Sadly, enabling data checksums is still not the default behavior when creating a new Postgres cluster. When you...
Read More
Fun with SQL
6 min read
David ChristensenJun 7, 2021
Devious SQL: Run the Same Query Against Tables With Differing Columns
David ChristensenJun 7, 2021
We spend time day in, day out, answering the questions that matter and coming up with solutions that make the most sense. However, sometimes a question comes up that is just so darn…interesting that even if there are sensible solutions or workarounds, it still seems like a challenge just to take the request literally. Thus was born this blog series, Devious SQL. Devious: "longer and less direct than the most straightforward way." The basic question/request that inspired this article (and in fact...
Read More
Fun with SQL
10 min read
David ChristensenMay 25, 2021
Simulating UPDATE or DELETE with LIMIT in Postgres: CTEs to The Rescue!
David ChristensenMay 25, 2021
There have certainly been times when using PostgreSQL, that I’ve yearned for an or statement with a feature. While the SQL standard itself has no say in the matter as of SQL:2016, there are definite cases of existing SQL database dialects that support this. Sadly, if you try to do something like this in PostgreSQL, this is the result: Before we dig into the nitty-gritty, let's look at some use cases for such a feature. The primary desire for this behavior is to break large transactions up...
Read More
Fun with SQL
6 min read
Joe ConwayMar 18, 2021
Musings of a PostgreSQL Data Pontiff Episode 1
Joe ConwayMar 18, 2021
This is the first in a series of blogs on the topic of using PostgreSQL for "data science". I put that in quotes because I would not consider myself to be a practicing data scientist, per se. Of course I'm not sure there is a universally accepted definition of data scientist. This article provides a nice illustration of my point. I do believe my credentials are such that no one can accuse me of term appropriation. Toward establishment of that end, this first installment is a walk down memory l...
Read More
Fun with SQL
6 min read
Paul RamseyFeb 22, 2021
Fuzzy Name Matching in Postgres
Paul RamseyFeb 22, 2021
A surprisingly common problem in both application development and analysis is: given an input name, find the database record it most likely refers to. It's common because databases of names and people are common, and it's a problem because names are a very irregular identifying token. The page " Falsehoods Programmers Believe About Names " covers some of the ways names are hard to deal with in programming. This post will ignore most of those complexities, and deal with the problem of matching up...
Read More
Fun with SQL
9 min read
Steve PoustyFeb 19, 2021
Using PostgreSQL to Shape and Prepare Scientific Data
Steve PoustyFeb 19, 2021
Today we are going to walk through some of the preliminary data shaping steps in data science using SQL in Postgres. I have a long history of working in data science , including my Masters Degree (in Forestry) and Ph.D. (in Ecology) and during this work I would often get raw data files that I had to get into shape to run analysis. Whenever you start to do something new there is always some uncomfortableness . That “why is this so hard” feeling often stops me from trying something new, but...
Read More
Fun with SQL
8 min read
Steve PoustyJan 12, 2021
R Predictive Analytics in Data Science Work using PostgreSQL
Steve PoustyJan 12, 2021
Greetings friends! We have come to our final blog post in my series about the data science workflow using PostgreSQL. In the last blog post , we used PL/R to create a function which returns the output from a logistic regression model trained on our fire data. We then took that model object and stored it into a separate table. Today we are going to finish up by showing how to use that stored model to make predictions on new data. By the way, I did all of the Postgres work for the entire blog s...
Read More
Fun with SQL
8 min read
Steve PoustyJan 6, 2021
Using R in Postgres for Logistic Regression Modeling
Steve PoustyJan 6, 2021
Greetings friends! We have finally come to the point in the Postgres for Data Science series where we are not doing data preparation. Today we are going to do modeling and prediction of fire occurrence given weather parameters… IN OUR DATABASE! Quick recap: 1. We found some data on historical fires and historical weather data for California. 2. We fixed up the data and merged it to make it more appropriate for logistic regression. 3. We randomly sampled non-fire data and split the data int...
Read More
Fun with SQL
8 min read
Steve PoustyDec 10, 2020
Using PL/pgSQL to Calculate New Postgres Columns
Steve PoustyDec 10, 2020
In our last blog post on using Postgres for statistics , I covered some of the decisions on how to handle calculated columns in PostgreSQL . I chose to go with adding extra columns to the same table and inserting the calculated values into these new columns. Today’s post is going to cover how to implement this solution using PL/pgSQL . I bet your first question is going to be: Why did you choose to do this in PL/pgSQL rather than just some quick editor work? I actually started with handcrafti...
Read More

1 2 3