Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Steve Pousty
Steve Pousty
Today we are going to walk through some of the preliminary data shaping steps in data science using SQL in Postgres. I have a long history of working in data science , including my Masters Degree (in Forestry) and Ph.D. (in Ecology) and during this work I would often get raw data files that I had to get into shape to run analysis. Whenever you start to do something new there is always some uncomfortableness . That “why is this so hard” feeling often stops me from trying something new, but...
Read MoreSteve Pousty
Steve Pousty
Greetings friends! We have come to our final blog post in my series about the data science workflow using PostgreSQL. In the last blog post , we used PL/R to create a function which returns the output from a logistic regression model trained on our fire data. We then took that model object and stored it into a separate table. Today we are going to finish up by showing how to use that stored model to make predictions on new data. By the way, I did all of the Postgres work for the entire blog s...
Read MoreSteve Pousty
Steve Pousty
Greetings friends! We have finally come to the point in the Postgres for Data Science series where we are not doing data preparation. Today we are going to do modeling and prediction of fire occurrence given weather parameters… IN OUR DATABASE! Quick recap: 1. We found some data on historical fires and historical weather data for California. 2. We fixed up the data and merged it to make it more appropriate for logistic regression. 3. We randomly sampled non-fire data and split the data int...
Read MoreSteve Pousty
Steve Pousty
In our last blog post on using Postgres for statistics , I covered some of the decisions on how to handle calculated columns in PostgreSQL . I chose to go with adding extra columns to the same table and inserting the calculated values into these new columns. Today’s post is going to cover how to implement this solution using PL/pgSQL . I bet your first question is going to be: Why did you choose to do this in PL/pgSQL rather than just some quick editor work? I actually started with handcrafti...
Read MoreSteve Pousty
Steve Pousty
Greetings readers, today we're going to take a semi-break from my “doing data science in SQL ” series to cover a really cool use case I just solved with regular expressions ( regex ) in Postgres . For those of you who have a bad taste in your mouth from earlier run-ins with regexs, this will be more use case focused and I will do my best to explain the search patterns I used. If you've never heard of regex, there are good resources to learn more about them but I will not be giving a t...
Read MoreSteve Pousty
Steve Pousty
In the last two blog posts on data science in Postgres, we got our data ready for regression analysis and had predictive variables that are on wildly different scales. Another example of data on different scales would be annual income versus age. The former is usually at least tens of thousands while age rarely gets to a hundred. If you do the regression with non-transformed variables, it becomes hard to compare the effect of the different variables. Statisticians account for this by convertin...
Read MoreSteve Pousty
Steve Pousty
In the last post of this series we introduced trying to model fire probability in Northern California based on weather data. We showed how to use SQL to do data shaping and preparation. We ended with a data set that was ready with all the fire occurrences and weather data in a single table almost prepped for logistic regression. There is now one more step: sample the data. If you have worked with logistic regression before you know you should try to balance the number of occurrences (1) with a...
Read MoreSteve Pousty
Steve Pousty
My introduction to databases and PostgreSQL was for web application development and statistical analysis. I learned just enough SQL to get the queries to return the right answers. Because of my work with PostGIS (and FOSS4G) I became friends with Paul Ramsey . We are now co-workers at Crunchy Data and he is helping me up my SQL-fu. One of the first lessons he taught me was "Try to use joins rather than subqueries." Today's post is going to work through this advice, as Paul and I work throug...
Read MoreSteve Pousty
Steve Pousty
There you are writing some SQL, having a great time. Uh oh, you need to iterate over each item in a result set and apply a function. You think, "Now I am going to have to write a stored procedure." Well today's post will give you an alternative by using lateral joins in Postgres . Lateral... See what I did there? You are probably familiar with normal database joins , which are usually used to match up a column in one table with a column in another table to bring the data from both tables toget...
Read MoreSteve Pousty
Steve Pousty
Today’s blog post is going to be a nice little adventure of learning how to use composite primary keys in a PostgreSQL many-to-many relationship table while building a Django application. Along the way we will talk about some basics of Django and some workarounds you need to use. Let’s dig in and get started. Here on the developer relations team at Crunchy Data , we have started building a demo application that manages Dungeon and Dragons (D&D) players, characters, dungeon masters, and c...
Read More