Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Steve Pousty
Steve Pousty
Today we are going to walk through some of the preliminary data shaping steps in data science using SQL in Postgres. I have a long history of working in data science, including my Masters Degree (in Forestry) and Ph.D. (in Ecology) and during this work I would often get raw data files that I had to get into shape
Steve Pousty
Steve Pousty
Greetings friends! We have come to our final blog post in my series about the data science workflow using PostgreSQL. In the last blog post, we used PL/R to create a function which returns the output from a logistic regression model trained on our fire data. We then took that model object and stored it into a separate table.
Today we are going to finish up by showing how to use that stored model to make predictions on new data. By the way, I did all of the Postgres work for the entire blog series in Crunchy Bridge.
Steve Pousty
Steve Pousty
Greetings friends! We have finally come to the point in the Postgres for Data Science series where we are not doing data preparation. Today we are going to do modeling and prediction of fire occurrence given weather parameters… IN OUR DATABASE!
Quick recap:
Steve Pousty
Steve Pousty
In our last blog post on using Postgres for statistics, I covered some of the decisions on how to handle calculated columns in PostgreSQL. I chose to go with adding extra columns to the same table and inserting the calculated values into these new columns. Today’s post is going to cover how to implement this solution using PL/pgSQL
Steve Pousty
Steve Pousty
Greetings readers, today we're going to take a semi-break from my “doing data science in SQL” series to cover a really cool use case I just solved with regular expressions
Steve Pousty
Steve Pousty
In the last two blog posts on data science in Postgres, we got our data ready for regression analysis and had predictive variables that are on wildly different scales. Another example of data on different scales would be annual income versus age. The former is usually at least tens of thousands while age rarely gets to a hundred.
If you do the regression with non-transformed variables, it becomes hard to compare the effect of the different variables. Statisticians account for this by converting raw data values into a Z-score
Steve Pousty
Steve Pousty
In the last post of this series we introduced trying to model fire probability in Northern California based on weather data. We showed how to use SQL to do data shaping and preparation. We ended with a data set that was ready with all the fire occurrences and weather data in a single table almost prepped for logistic regression.
There is now one more step: sample the data. If you have worked with logistic regression before you know you should try to balance the number of occurrences (1) with absences (0). To do this we are going to sample out from the non_fire_weather
Steve Pousty
Steve Pousty
My introduction to databases and PostgreSQL was for web application development and statistical analysis. I learned just enough SQL to get the queries to return the right answers. Because of my work with PostGIS
Steve Pousty
Steve Pousty
There you are writing some SQL, having a great time. Uh oh, you need to iterate over each item in a result set and apply a function. You think, "Now I am going to have to write a stored procedure." Well today's post will give you an alternative by using lateral joins in Postgres
Steve Pousty
Steve Pousty
Today’s blog post is going to be a nice little adventure of learning how to use composite primary keys in a PostgreSQL many-to-many relationship table while building a Django application. Along the way we will talk about some basics of Django and some workarounds you need to use. Let’s dig in and get started.
Here on the developer relations team at Crunchy Data