Back to Basics with PostgreSQL Data Types
When I first started to learn how to code, I was introduced to the concept of data types: a 6 is not the same as "6", because the former is numeric (typically an integer type, with some variations in terminology based on the language) and the latter a string; "true" is not necessarily the same as true, because true can be a Boolean value in some languages.
Underneath the code, these pieces of data are really just a combination of ones and zeros, but declaring their types allows them to play a particular role. A value’s data type tells the computer when an expression like 6+6 should return a 12, or 66! That said, it wasn’t until I started learning a little bit more about enterprise databases and SQL that I realized what else could be impacted by data types.
In SQL, we create tables to hold data, and when we create each table we also define its properties - including each column’s data types. By doing so, we’re telling the database what kind of data to expect and allow in each column, so it helps keep our data clean. It’s similar to formatting a column in an Excel spreadsheet so that we can go on and use functions that are appropriate for processing the kind of data we have. But it doesn’t stop there: it can also determine how much space each piece of data takes up (and ultimately impact how much storage space your data tables will occupy on disk - check out this post for an analysis by my colleague Joe Conway). Choosing the right data types can also help your database perform more efficiently, so your applications run faster.
Diving deeper into data types
I’m new to PostgreSQL, so here are some of the interesting things I learned as I dove a little deeper into a few of the base data types:
Character types
I knew of the difference between char and varchar, but I didn’t realize that
there are actually no substantial differences in performance between them or with the PostgreSQL text type as well. Text allows variable unlimited length, whereas char(n) and varchar(n) need the fixed and max length specified respectively. (varchar without a specified max length n is equivalent to text.)
Unless you’re sure that your values are only ever going to be a fixed length (say, US state abbreviations?), it seems that there aren’t a whole lot of compelling reasons for choosing char. I found out that it was perfectly fine in most cases to just use the PostgreSQL text type, and add a constraint to the column if I must limit the length. This works quite well if you ever foresee the length requirement changing with time.
Numeric and monetary types
Real and double are not recommended for storing monetary values, since these data types are of variable precision (and you’ll typically want to guarantee exactness when it comes to money).
And with that said, I was a little surprised to find out that the money type itself can behave in ways you may not predict. For example, if I had initially saved a value of 100.00 at a point when the database’s underlying currency was set to U.S. dollars, and then I changed the currency setting to Canadian dollars, I might forget that my 100.00 is actually no longer the correct CAD amount. So, using numeric when dealing with money might be a safer bet. If you need to support multiple currencies, the corresponding currency codes could be stored in a separate column.
Date/time types
The biggest thing I learned here is that dealing with dates and times can be especially complicated when it comes to databases. Here’s a few reasons why:
- A value such as “2020-01-01 01:00” only means something if you know what time zone or location it’s associated with;
- Even if you’ve indicated the time zone in the input value, in Postgres the value is stored internally in Coordinated Universal Time (UTC);
- The internal values are converted on output from UTC to local time as specified in the TimeZone configuration, which could be set in postgresql.conf, or per session with SET TIMEZONE
There are many different ways that date/time values can throw you for a loop. I’m certainly no expert, but what I can pass along from the experts is: always use timestamp with time zone at the very least (aka timestamptz)!
Play Around with Data Types
Crunchy Data just released two new courses about PostgreSQL data types in our interactive learning portal. Our learning portal is up 24/7 and gives you a sandbox where you can learn more about data types in a hands-on way.
Head over to the Basic Data Types course for a refresher on character, numeric, date/time, and boolean types. Then, to learn about a few other data types that might be less commonly used, keep going with the Advanced Data Types course.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read