Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
Casting is the method one want to use to convert a data in PostgreSQL from one datatype to another.
The main function to do that in PostgreSQL is
CAST ( expression AS target_type );
One can convert from one data type to any datatype with some limits:
text
and varchar
are binary coercible both ways. Binary coercibility is not necessarily a symmetric relationship. For example, the cast from xml
to text
" (source docs here)
Examples: some may work, some may not
select cast('1' as integer);
select cast('10000000000' as integer);
select cast('10000000000' as bigint);
select cast('abc' as integer);
select cast('1' as text);
select cast('true' as boolean);
This function will be helpful in the current tutorial to understand how the PostgreSQL interprets the data sent to it:
select pg_typeof(1);
With texts, the function will return “unknown” if we don’t cast it to the correct datatype, because there are many data types in PostgreSQL to handle it like text, char, varchar…
select pg_typeof('abc'),pg_typeof(cast ('abc' as text));
A common error messages that frustrates non-DBAs is:
`ERROR: operator does not exist: integer ~~ unknown
`ERROR: function repeat(integer, unknown) does not exist
Sometimes, casting is not needed, because PostgreSQL will do an automatic conversion for you when it can be done. This is called “implicit cast”. On this example we try adding string ‘12’ and number 10: it will work, because PostgreSQL can convert that string ‘12” as a number to satisfy the addition of the two items:
select '12'+10 as implicit_cast;
Beware that here ‘12’ can be translated into an integer automatically with an implicit cast, because “+” and 10 (integer) could be then satisfied. With ‘12.1’ this won’t work anymore because ‘12.1” cannot be casted into integer!
select '12.1'+10 as implicit_cast;
So for that to work, one will cast 10 as a numeric instead, because the numeric representation of string ‘12.1” is not an integer, but a numeric:
select '12.1'+cast(10 as numeric) as implicit_cast;
A good practice when coding with PostgreSQL is to avoid implicit casts! The application developer will then have better control on his code, avoiding as much as possible implicit casts.
About boolean, if a integer is given, 0 will be false, all other values will be true. If a text representing a boolean is given, anything like t, tr, tru,true,y, ye, yes, on will be true, and anything like f, fa, fal, fals, false, n, no, of, off will be false. Anything else will throw an error.
When converting a NULL to any other data type, the result is invariably null. Tough, beware that the chain ‘null’ is not a NULL value :-)
with tests
(test)
as (values
('tRUe')
,('F')
,('on')
,('of')
,('ye')
,(NULL)
)
select
test
, pg_typeof(test) as typeof_test
, test::boolean as test_bolean_value
from
tests;
Most often, casting is used to convert some text representing a date in a real date, or date/time, for comparison purposes typically in the WHERE clause.
We're going to dig into SQL with the public dataset of US births from 2000-2014. But first we're going to work on creating the dataset in a more Postgres centric way with a new table and the date format:
CREATE TABLE births_cast (
id serial,
day date,
births int
);
INSERT INTO births_cast (day, births)
SELECT make_date(year, month, date_of_month),
births
FROM US_births_20002014_SSA;
A simple first query on our data set, we want to know how many people were born on June, 10th 2001 in the US:
select day, births from births_cast where day=cast('2001-06-10' as date);
The special operator “::” is a great replacement of cast() function: you’ll find that maybe more readable. At least, it’s shorter:
select day, births from births_cast where day='2001-06-10'::date;
About that “::” special operator, depending the application’s language, it could be hard to use, because the developers may have to escape those characters. If you face this problem, use the cast() function instead in the SQL queries.select day, births from births where day=’2001-06-10’::date;
PostgreSQL can also use implicit casts, meaning it tries to understand what the user wants, hiding this way mechanisms of datatype conversions. Still, we don’t recommend that, since it can lead to some undesired situations. In this case, implicit conversion works as expected;
select day, births from births_cast where day='2001-06-10';
But if you use different way than the ISO date (’YYYY-MM-DD’), as assumed above by PostgreSQL, it can lead to some difficulties. The following query stills works in the current setting of our playground database here:
select day, births from births_cast where day='06/10/2001';
But that’s because the datestyle used is define as “ISO, MDY”:
show datestyle;
On a different parts of the world, the datestyle may most often be “ISO,DMY”:
set datestyle to 'ISO, DMY';
select day, births from births_cast where day='06/10/2001';
Here, it’s not more the “June, 10th” that PostgreSQL will search for but “October, 6th” !
That’s why we recommend developers to use the ISO date format whenever possible.
Beware also when you convert a date to timestamp and vice/versa:
When converting from a timestamp to a date, basically, hour (and timezone, depending the source datatype) is removed:
select current_timestamp, current_timestamp::date, current_date, localtimestamp, localtimestamp::date;
But when the contrary is done, from date to timestamp, hour “00:00:00” is added, with timezone or not, depending the destination datatype:
select current_date::timestamp, current_date::timestamptz;
Conversion to interval can be useful for date operations, and can be easily expressed in a text chain like “90 days 10 minutes 10 seconds” as an example:
select current_timestamp + '90 days 10 minutes 10 seconds'::interval as test;
As seen before, if an interval is added (works also with subtraction) to a date, given the hours’ part of the date is then ‘00:00:00’, intervals’ values are added to 0:
select current_date + '90 days 10 minutes 10 seconds'::interval as test;
Another example, of date and interval calculations. One wants to get the last day of the current month. With PostgreSQL, functions on dates plus some date and interval calculations allows to : truncate the current date up to the month will add a defaut “1” as day on the timestamp returned. From there, we then add a full month to that timestamp(tz or not) then we substract one day of that timestamp, to finally add a cast to express it as a date:
select
current_date as today
,date_trunc('month', current_date::timestamptz) as today_truncated_to_month
,(date_trunc('month', current_date + '1 month'::interval) - '1 day'::interval)::date as last_day_of_current_month;
Let’s create an index on column day, which is an integer:
create index on births_cast (births);
The index will be usable only if integer values are set in the search. As an example, if you cast it to numeric, the index won’t be used:
explain select count(*) from births_cast where births=7674;
explain select count(*) from births_cast where births=7674::numeric;
That’s why in the later query, a sequential on table births is issued, and not in the first one, where a (bitmap) index scan is issued.
Bear that in mind when you’re casting between data types. It can affect a lot your performances.
If you really need to cast a given datatype in an another often:
Loading terminal...
Loading terminal...