Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Latest posts from Steve Pousty

  • 9 min read

    Using PostgreSQL to Shape and Prepare Scientific Data

    Steve Pousty

    Today we are going to walk through some of the preliminary data shaping steps in data science using SQL in Postgres. I have a long history of working in data science, including my Masters Degree (in Forestry) and Ph.D. (in Ecology) and during this work I would often get raw data files that I had to get into shape

    Read More
  • 8 min read

    R Predictive Analytics in Data Science Work using PostgreSQL

    Steve Pousty

    Greetings friends! We have come to our final blog post in my series about the data science workflow using PostgreSQL. In the last blog post, we used PL/R to create a function which returns the output from a logistic regression model trained on our fire data. We then took that model object and stored it into a separate table.

    Today we are going to finish up by showing how to use that stored model to make predictions on new data. By the way, I did all of the Postgres work for the entire blog series in Crunchy Bridge.

    Read More
  • 8 min read

    Using R in Postgres for Logistic Regression Modeling

    Steve Pousty

    Greetings friends! We have finally come to the point in the Postgres for Data Science series where we are not doing data preparation. Today we are going to do modeling and prediction of fire occurrence given weather parameters… IN OUR DATABASE!

    Quick recap:

    1. We found some data
    Read More
  • 8 min read

    Using PL/pgSQL to Calculate New Postgres Columns

    Steve Pousty

    In our last blog post on using Postgres for statistics, I covered some of the decisions on how to handle calculated columns in PostgreSQL. I chose to go with adding extra columns to the same table and inserting the calculated values into these new columns. Today’s post is going to cover how to implement this solution using PL/pgSQL

    Read More
  • 7 min read

    Replacing Lines of Code with 2 Little Regexs in Postgres

    Steve Pousty

    Greetings readers, today we're going to take a semi-break from my “doing data science in SQL” series to cover a really cool use case I just solved with regular expressions

    Read More
  • 8 min read

    Using Postgres for Statistics: Centering and Standardizing Data

    Steve Pousty

    In the last two blog posts on data science in Postgres, we got our data ready for regression analysis and had predictive variables that are on wildly different scales. Another example of data on different scales would be annual income versus age. The former is usually at least tens of thousands while age rarely gets to a hundred.

    If you do the regression with non-transformed variables, it becomes hard to compare the effect of the different variables. Statisticians account for this by converting raw data values into a Z-score

    Read More
  • 7 min read

    Using PostgreSQL and SQL to Randomly Sample Data

    Steve Pousty

    In the last post of this series we introduced trying to model fire probability in Northern California based on weather data. We showed how to use SQL to do data shaping and preparation. We ended with a data set that was ready with all the fire occurrences and weather data in a single table almost prepped for logistic regression.

    There is now one more step: sample the data. If you have worked with logistic regression before you know you should try to balance the number of occurrences (1) with absences (0). To do this we are going to sample out from the non_fire_weather

    Read More
  • 7 min read

    Joins or Subquery in PostgreSQL: Lessons Learned

    Steve Pousty

    My introduction to databases and PostgreSQL was for web application development and statistical analysis. I learned just enough SQL to get the queries to return the right answers. Because of my work with PostGIS

    Read More
  • Iterators in PostgreSQL with Lateral Joins

    Steve Pousty

    There you are writing some SQL, having a great time. Uh oh, you need to iterate over each item in a result set and apply a function. You think, "Now I am going to have to write a stored procedure." Well today's post will give you an alternative by using lateral joins in Postgres

    Read More
  • Composite Primary Keys, PostgreSQL and Django

    Steve Pousty

    Today’s blog post is going to be a nice little adventure of learning how to use composite primary keys in a PostgreSQL many-to-many relationship table while building a Django application. Along the way we will talk about some basics of Django and some workarounds you need to use. Let’s dig in and get started.

    pasted image 0 (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1)

    Here on the developer relations team at Crunchy Data

    Read More