Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Posts about Fun with SQL

  • 3 min read

    Fast, Flexible Summaries with Aggregate Filters and Windows

    Paul Ramsey

    PostgreSQL can provide high performance summaries over multi-million record tables, and supports some great SQL sugar to make it concise and readable, in particular aggregate filtering, a feature unique to PostgreSQL and SQLite . A huge amount of reporting is about generating percentages: for a particular condition, what is a value relative to a baseline. Here's a quick "sales table" with three categories ("a" and "b" and "c") and one million random values between 0 and 10: In the bad-old-days,...

    Read More
  • 10 min read

    Fun with pg_checksums

    Greg Sabino Mullane

    Data checksums are a great feature in PostgreSQL. They are used to detect any corruption of the data that Postgres stores on disk. Every system we develop at Crunchy Data has this feature enabled by default. It's not only Postgres itself that can make use of these checksums. Some third party tools such as the awesome pgBackRest program can also use them for enhanced data integrity. Sadly, enabling data checksums is still not the default behavior when creating a new Postgres cluster. When you...

    Read More
  • 6 min read

    Devious SQL: Run the Same Query Against Tables With Differing Columns

    David Christensen

    We spend time day in, day out, answering the questions that matter and coming up with solutions that make the most sense. However, sometimes a question comes up that is just so darn…interesting that even if there are sensible solutions or workarounds, it still seems like a challenge just to take the request literally. Thus was born this blog series, Devious SQL. Devious: "longer and less direct than the most straightforward way." The basic question/request that inspired this article (and in fact...

    Read More
  • 10 min read

    Simulating UPDATE or DELETE with LIMIT in Postgres: CTEs to The Rescue!

    David Christensen

    There have certainly been times when using PostgreSQL, that I’ve yearned for an or statement with a feature. While the SQL standard itself has no say in the matter as of SQL:2016, there are definite cases of existing SQL database dialects that support this. Sadly, if you try to do something like this in PostgreSQL, this is the result: Before we dig into the nitty-gritty, let's look at some use cases for such a feature. The primary desire for this behavior is to break large transactions up...

    Read More
  • 6 min read

    Musings of a PostgreSQL Data Pontiff Episode 1

    Joe Conway

    This is the first in a series of blogs on the topic of using PostgreSQL for "data science". I put that in quotes because I would not consider myself to be a practicing data scientist, per se. Of course I'm not sure there is a universally accepted definition of data scientist. This article provides a nice illustration of my point. I do believe my credentials are such that no one can accuse me of term appropriation. Toward establishment of that end, this first installment is a walk down memory l...

    Read More
  • 6 min read

    Fuzzy Name Matching in Postgres

    Paul Ramsey

    A surprisingly common problem in both application development and analysis is: given an input name, find the database record it most likely refers to. It's common because databases of names and people are common, and it's a problem because names are a very irregular identifying token. The page " Falsehoods Programmers Believe About Names " covers some of the ways names are hard to deal with in programming. This post will ignore most of those complexities, and deal with the problem of matching up...

    Read More
  • 9 min read

    Using PostgreSQL to Shape and Prepare Scientific Data

    Steve Pousty

    Today we are going to walk through some of the preliminary data shaping steps in data science using SQL in Postgres. I have a long history of working in data science , including my Masters Degree (in Forestry) and Ph.D. (in Ecology) and during this work I would often get raw data files that I had to get into shape to run analysis. Whenever you start to do something new there is always some uncomfortableness . That “why is this so hard” feeling often stops me from trying something new, but...

    Read More
  • 8 min read

    R Predictive Analytics in Data Science Work using PostgreSQL

    Steve Pousty

    Greetings friends! We have come to our final blog post in my series about the data science workflow using PostgreSQL. In the last blog post , we used PL/R to create a function which returns the output from a logistic regression model trained on our fire data. We then took that model object and stored it into a separate table. Today we are going to finish up by showing how to use that stored model to make predictions on new data. By the way, I did all of the Postgres work for the entire blog s...

    Read More
  • 8 min read

    Using R in Postgres for Logistic Regression Modeling

    Steve Pousty

    Greetings friends! We have finally come to the point in the Postgres for Data Science series where we are not doing data preparation. Today we are going to do modeling and prediction of fire occurrence given weather parameters… IN OUR DATABASE! Quick recap: 1. We found some data on historical fires and historical weather data for California. 2. We fixed up the data and merged it to make it more appropriate for logistic regression. 3. We randomly sampled non-fire data and split the data int...

    Read More
  • 8 min read

    Using PL/pgSQL to Calculate New Postgres Columns

    Steve Pousty

    In our last blog post on using Postgres for statistics , I covered some of the decisions on how to handle calculated columns in PostgreSQL . I chose to go with adding extra columns to the same table and inserting the calculated values into these new columns. Today’s post is going to cover how to implement this solution using PL/pgSQL . I bet your first question is going to be: Why did you choose to do this in PL/pgSQL rather than just some quick editor work? I actually started with handcrafti...

    Read More