Announcing Crunchy Bridge for Analytics—interact with your data lake using Postgres commands via extension, with a vectorized, parallel query engine. Learn more in our announcement.

Tutorial Instructions

Casting Data Types

What is a datatype ?

  • A datatype is defined by mostly by
    • the particular kind of data it represents
    • the values it can takes
    • the operations it can be done on it
  • PostgreSQL comes with more than 49 data types out of the box (numbers, text, characters, boolean, money, geometry shapes, IP addresses and networks, etc.)
  • You can even create your own data types, but that is a different tutorial

What is casting ?

Casting is the method one want to use to convert a data in PostgreSQL from one datatype to another.

The main function to do that in PostgreSQL is

CAST ( expression AS target_type );

One can convert from one data type to any datatype with some limits:

  • when converting Numeric Types, the expression to be converted has to match the possible range of the target_type, see Postgres docs for more on this.
  • per PostgreSQL documentation : “The conversion must be so that the two types (source, target) can binary coercible, which means that the conversion can be performed “for free” without invoking any function. This requires that corresponding values use the same internal representation. For instance, the types text and varcharare binary coercible both ways. Binary coercibility is not necessarily a symmetric relationship. For example, the cast from xml to text" (source docs here)

Examples: some may work, some may not

select cast('1' as integer); select cast('10000000000' as integer); select cast('10000000000' as bigint); select cast('abc' as integer); select cast('1' as text); select cast('true' as boolean);

pg_typeof() function

This function will be helpful in the current tutorial to understand how the PostgreSQL interprets the data sent to it:

select pg_typeof(1);

With texts, the function will return “unknown” if we don’t cast it to the correct datatype, because there are many data types in PostgreSQL to handle it like text, char, varchar…

select pg_typeof('abc'),pg_typeof(cast ('abc' as text));

Why casting is important?

A common error messages that frustrates non-DBAs is:

 `ERROR:  operator does not exist: integer ~~ unknown
 `ERROR:  function repeat(integer, unknown) does not exist
  • You can look at the functions, see that it should work, but why does it not work? Maybe because of the following:
    • SQL is a strongly typed language
    • Comparisons fail when comparing dissimilar data types ( text v. string )
    • Functions require specific specific data types as part of declaration

Implicit casts

Sometimes, casting is not needed, because PostgreSQL will do an automatic conversion for you when it can be done. This is called “implicit cast”. On this example we try adding string ‘12’ and number 10: it will work, because PostgreSQL can convert that string ‘12” as a number to satisfy the addition of the two items:

select '12'+10 as implicit_cast;

Beware that here ‘12’ can be translated into an integer automatically with an implicit cast, because “+” and 10 (integer) could be then satisfied. With ‘12.1’ this won’t work anymore because ‘12.1” cannot be casted into integer!

select '12.1'+10 as implicit_cast;

So for that to work, one will cast 10 as a numeric instead, because the numeric representation of string ‘12.1” is not an integer, but a numeric:

select '12.1'+cast(10 as numeric) as implicit_cast;

A good practice when coding with PostgreSQL is to avoid implicit casts! The application developer will then have better control on his code, avoiding as much as possible implicit casts.

Boolean and NULL

About boolean, if a integer is given, 0 will be false, all other values will be true. If a text representing a boolean is given, anything like t, tr, tru,true,y, ye, yes, on will be true, and anything like f, fa, fal, fals, false, n, no, of, off will be false. Anything else will throw an error.

When converting a NULL to any other data type, the result is invariably null. Tough, beware that the chain ‘null’ is not a NULL value :-)

with tests (test) as (values ('tRUe') ,('F') ,('on') ,('of') ,('ye') ,(NULL) ) select test , pg_typeof(test) as typeof_test , test::boolean as test_bolean_value from tests;

Converting from/to date and timestamp

Most often, casting is used to convert some text representing a date in a real date, or date/time, for comparison purposes typically in the WHERE clause.

We're going to dig into SQL with the public dataset of US births from 2000-2014. But first we're going to work on creating the dataset in a more Postgres centric way with a new table and the date format:

CREATE TABLE births_cast ( id serial, day date, births int ); INSERT INTO births_cast (day, births) SELECT make_date(year, month, date_of_month), births FROM US_births_20002014_SSA;

A simple first query on our data set, we want to know how many people were born on June, 10th 2001 in the US:

select day, births from births_cast where day=cast('2001-06-10' as date);

The special operator “::” is a great replacement of cast() function: you’ll find that maybe more readable. At least, it’s shorter:

select day, births from births_cast where day='2001-06-10'::date;

About that “::” special operator, depending the application’s language, it could be hard to use, because the developers may have to escape those characters. If you face this problem, use the cast() function instead in the SQL queries.select day, births from births where day=’2001-06-10’::date;

PostgreSQL can also use implicit casts, meaning it tries to understand what the user wants, hiding this way mechanisms of datatype conversions. Still, we don’t recommend that, since it can lead to some undesired situations. In this case, implicit conversion works as expected;

select day, births from births_cast where day='2001-06-10';

But if you use different way than the ISO date (’YYYY-MM-DD’), as assumed above by PostgreSQL, it can lead to some difficulties. The following query stills works in the current setting of our playground database here:

select day, births from births_cast where day='06/10/2001';

But that’s because the datestyle used is define as “ISO, MDY”:

show datestyle;

On a different parts of the world, the datestyle may most often be “ISO,DMY”:

set datestyle to 'ISO, DMY'; select day, births from births_cast where day='06/10/2001';

Here, it’s not more the “June, 10th” that PostgreSQL will search for but “October, 6th” !

That’s why we recommend developers to use the ISO date format whenever possible.

Beware also when you convert a date to timestamp and vice/versa:

When converting from a timestamp to a date, basically, hour (and timezone, depending the source datatype) is removed:

select current_timestamp, current_timestamp::date, current_date, localtimestamp, localtimestamp::date;

But when the contrary is done, from date to timestamp, hour “00:00:00” is added, with timezone or not, depending the destination datatype:

select current_date::timestamp, current_date::timestamptz;

Intervals

Conversion to interval can be useful for date operations, and can be easily expressed in a text chain like “90 days 10 minutes 10 seconds” as an example:

select current_timestamp + '90 days 10 minutes 10 seconds'::interval as test;

As seen before, if an interval is added (works also with subtraction) to a date, given the hours’ part of the date is then ‘00:00:00’, intervals’ values are added to 0:

select current_date + '90 days 10 minutes 10 seconds'::interval as test;

Another example, of date and interval calculations. One wants to get the last day of the current month. With PostgreSQL, functions on dates plus some date and interval calculations allows to : truncate the current date up to the month will add a defaut “1” as day on the timestamp returned. From there, we then add a full month to that timestamp(tz or not) then we substract one day of that timestamp, to finally add a cast to express it as a date:

select current_date as today ,date_trunc('month', current_date::timestamptz) as today_truncated_to_month ,(date_trunc('month', current_date + '1 month'::interval) - '1 day'::interval)::date as last_day_of_current_month;

Datatype conversions and indexes

Let’s create an index on column day, which is an integer:

create index on births_cast (births);

The index will be usable only if integer values are set in the search. As an example, if you cast it to numeric, the index won’t be used:

explain select count(*) from births_cast where births=7674; explain select count(*) from births_cast where births=7674::numeric;

That’s why in the later query, a sequential on table births is issued, and not in the first one, where a (bitmap) index scan is issued.

Bear that in mind when you’re casting between data types. It can affect a lot your performances.

If you really need to cast a given datatype in an another often:

  • maybe you’re just have a schema problem, and you have to change the datatype of the column, that would be the simplest and most effective thing to do. Then your index will match your queries
  • or if you cannot do that, you may create a functional index on the transformation you often use like:

Loading terminal...

Loading terminal...