Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Paul Ramsey
Paul Ramsey
PostgreSQL can provide high performance summaries over multi-million record tables, and supports some great SQL sugar to make it concise and readable, in particular aggregate filtering, a feature unique to PostgreSQL and SQLite.
A huge amount of reporting is about generating percentages: for a particular condition, what is a value relative to a baseline.
Greg Sabino Mullane
Greg Sabino Mullane
Data checksums are a great feature in PostgreSQL. They are used to detect any corruption of the data that Postgres stores on disk. Every system we develop at Crunchy Data has this feature enabled by default. It's not only Postgres itself that can make use of these checksums. Some third party tools such as the awesome pgBackRest program
David Christensen
David Christensen
We spend time day in, day out, answering the questions that matter and coming up with solutions that make the most sense. However, sometimes a question comes up that is just so darn…interesting that even if there are sensible solutions or workarounds, it still seems like a challenge just to take the request literally. Thus was born this blog series, Devious SQL.
Devious: "longer and less direct than the most straightforward way."
David Christensen
David Christensen
There have certainly been times when using PostgreSQL, that I’ve yearned for an UPDATE
or DELETE
statement with a LIMIT
feature. While the SQL standard itself has no say in the matter as of SQL:2016, there are definite cases of existing SQL database dialects that support this.
Sadly, if you try to do something like this in PostgreSQL, this is the result:
ERROR: syntax error at or near "LIMIT"
LINE 1: DELETE FROM big_table LIMIT 10000;
^
Joe Conway
Joe Conway
This is the first in a series of blogs on the topic of using PostgreSQL for "data science". I put that in quotes because I would not consider myself to be a practicing data scientist, per se. Of course I'm not sure there is a universally accepted definition of data scientist. This article
Paul Ramsey
Paul Ramsey
A surprisingly common problem in both application development and analysis is: given an input name, find the database record it most likely refers to. It's common because databases of names and people are common, and it's a problem because names are a very irregular identifying token.
Steve Pousty
Steve Pousty
Today we are going to walk through some of the preliminary data shaping steps in data science using SQL in Postgres. I have a long history of working in data science, including my Masters Degree (in Forestry) and Ph.D. (in Ecology) and during this work I would often get raw data files that I had to get into shape
Steve Pousty
Steve Pousty
Greetings friends! We have come to our final blog post in my series about the data science workflow using PostgreSQL. In the last blog post, we used PL/R to create a function which returns the output from a logistic regression model trained on our fire data. We then took that model object and stored it into a separate table.
Today we are going to finish up by showing how to use that stored model to make predictions on new data. By the way, I did all of the Postgres work for the entire blog series in Crunchy Bridge.
Steve Pousty
Steve Pousty
Greetings friends! We have finally come to the point in the Postgres for Data Science series where we are not doing data preparation. Today we are going to do modeling and prediction of fire occurrence given weather parameters… IN OUR DATABASE!
Quick recap:
Steve Pousty
Steve Pousty
In our last blog post on using Postgres for statistics, I covered some of the decisions on how to handle calculated columns in PostgreSQL. I chose to go with adding extra columns to the same table and inserting the calculated values into these new columns. Today’s post is going to cover how to implement this solution using PL/pgSQL