Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Posts about Fun with SQL

  • 7 min read

    Replacing Lines of Code with 2 Little Regexs in Postgres

    Steve Pousty

    Greetings readers, today we're going to take a semi-break from my “doing data science in SQL” series to cover a really cool use case I just solved with regular expressions

    Read More
  • 8 min read

    Using Postgres for Statistics: Centering and Standardizing Data

    Steve Pousty

    In the last two blog posts on data science in Postgres, we got our data ready for regression analysis and had predictive variables that are on wildly different scales. Another example of data on different scales would be annual income versus age. The former is usually at least tens of thousands while age rarely gets to a hundred.

    If you do the regression with non-transformed variables, it becomes hard to compare the effect of the different variables. Statisticians account for this by converting raw data values into a Z-score

    Read More
  • 10 min read

    Election Night Prediction Modeling using PL/R in Postgres

    Joe Conway

    I was sent a link to a tweet regarding election night forecasting using R, and of course the default question was ... could it be run under PL/R inside Postgres? Like almost everything at Crunchy Data

    Read More
  • 7 min read

    Using PostgreSQL and SQL to Randomly Sample Data

    Steve Pousty

    In the last post of this series we introduced trying to model fire probability in Northern California based on weather data. We showed how to use SQL to do data shaping and preparation. We ended with a data set that was ready with all the fire occurrences and weather data in a single table almost prepped for logistic regression.

    There is now one more step: sample the data. If you have worked with logistic regression before you know you should try to balance the number of occurrences (1) with absences (0). To do this we are going to sample out from the non_fire_weather

    Read More
  • 7 min read

    Joins or Subquery in PostgreSQL: Lessons Learned

    Steve Pousty

    My introduction to databases and PostgreSQL was for web application development and statistical analysis. I learned just enough SQL to get the queries to return the right answers. Because of my work with PostGIS

    Read More
  • 5 min read

    Building a recommendation engine inside Postgres with Python and Pandas

    Craig Kerstiens

    I'm a big fan of data in general. Data can tell you a lot about what users are doing and can help you gain all sorts of insights. One such aspect is in making recommendations based on past history or others that have made similar choices. In fact, years ago I wrote a small app to see if I could recommend wines based on how other ones were rated. It was a small app that I shared among just a handful of friends, some with similar taste, some with different taste. At first it was largely an academic exercise of writing a recommendation engine, but if I could find some new wines I liked along the way, then great. Turns out it was a lot more effective at recommending things than I expected, even with only a small handful of wines rated.

    The other thing I'm a fan of is Postgres

    Read More
  • 5 min read

    Announcing the Crunchy Data Developer Portal

    Steve Pousty

    Greetings friends of Crunchy Data, it is my pleasure to announce the initial release of our application developer portal. An awesome team has been working behind the scenes to bring together this nice little website to help application developers find all their Postgres needs in one place.

    Our goal is to become a single-stop resource for application developers looking to work with PostgreSQL. We have released three main parts to the site that form the foundation for future growth. Let’s go over the main pieces and then talk a bit about future directions.

    Documentation

    Read More
  • 8 min read

    Quick and Dirty Address Matching with LibPostal

    Paul Ramsey

    Most businesses have databases of previous customers, and data analysts will frequently be asked to join arbitrary data to the customer tables in order to provide analysis.

    Unfortunately joining address data together is notoriously difficult:

    • The same address can be expressed in many ways
    • The parts of addresses are not always clear
    • There are valid lexically very similar addresses very nearby any given address
    Read More