Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Steve Pousty
Steve Pousty
Greetings readers, today we're going to take a semi-break from my “doing data science in SQL ” series to cover a really cool use case I just solved with regular expressions ( regex ) in Postgres . For those of you who have a bad taste in your mouth from earlier run-ins with regexs, this will be more use case focused and I will do my best to explain the search patterns I used. If you've never heard of regex, there are good resources to learn more about them but I will not be giving a t...
Read MoreSteve Pousty
Steve Pousty
In the last two blog posts on data science in Postgres, we got our data ready for regression analysis and had predictive variables that are on wildly different scales. Another example of data on different scales would be annual income versus age. The former is usually at least tens of thousands while age rarely gets to a hundred. If you do the regression with non-transformed variables, it becomes hard to compare the effect of the different variables. Statisticians account for this by convertin...
Read MoreJoe Conway
Joe Conway
I was sent a link to a tweet regarding election night forecasting using R, and of course the default question was ... could it be run under PL/R inside Postgres? Like almost everything at Crunchy Data , we believe all things are better with Postgres. So I decided to give it a shot, and a bit of a database spin as it were. Since I had to get this blog done quickly, it is going to be mostly code -- sorry about that! The code in this blog (please see a small but important correction at the end)...
Read MoreSteve Pousty
Steve Pousty
In the last post of this series we introduced trying to model fire probability in Northern California based on weather data. We showed how to use SQL to do data shaping and preparation. We ended with a data set that was ready with all the fire occurrences and weather data in a single table almost prepped for logistic regression. There is now one more step: sample the data. If you have worked with logistic regression before you know you should try to balance the number of occurrences (1) with a...
Read MoreSteve Pousty
Steve Pousty
My introduction to databases and PostgreSQL was for web application development and statistical analysis. I learned just enough SQL to get the queries to return the right answers. Because of my work with PostGIS (and FOSS4G) I became friends with Paul Ramsey . We are now co-workers at Crunchy Data and he is helping me up my SQL-fu. One of the first lessons he taught me was "Try to use joins rather than subqueries." Today's post is going to work through this advice, as Paul and I work throug...
Read MoreCraig Kerstiens
Craig Kerstiens
I'm a big fan of data in general. Data can tell you a lot about what users are doing and can help you gain all sorts of insights. One such aspect is in making recommendations based on past history or others that have made similar choices. In fact, years ago I wrote a small app to see if I could recommend wines based on how other ones were rated. It was a small app that I shared among just a handful of friends, some with similar taste, some with different taste. At first it was largely an academi...
Read MoreSteve Pousty
Steve Pousty
Greetings friends of Crunchy Data, it is my pleasure to announce the initial release of our application developer portal . An awesome team has been working behind the scenes to bring together this nice little website to help application developers find all their Postgres needs in one place. Our goal is to become a single-stop resource for application developers looking to work with PostgreSQL. We have released three main parts to the site that form the foundation for future growth. Let’s go ove...
Read MorePaul Ramsey
Paul Ramsey
Most businesses have databases of previous customers, and data analysts will frequently be asked to join arbitrary data to the customer tables in order to provide analysis. Unfortunately joining address data together is notoriously difficult: • The same address can be expressed in many ways • The parts of addresses are not always clear • There are valid lexically very similar addresses very nearby any given address The same address can be expressed in many ways The parts of addresses are not alw...
Read More