Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Choosing a PostgreSQL Number Format

Avatar for Paul Ramsey

Paul Ramsey

5 min read

It should be the easiest thing in the world: you are modeling your data and you need a column for some numbers, what type do you use?

PostgreSQL offers a lot of different number types, and they all have advantages and limitations. You want the number type that is going to:

  • Store your data using the smallest amount of space
  • Represent your data with the smallest amount of error
  • Manipulate your data using the correct logic

If your processing needs can be met with a fixed size type (integer or floating point) then choose the type that has enough range to fit your data.

If you need to process data at extremely high precision, or store with exact precision, then use the numeric type, which has no range bounds and exact storage, at the price of size and processing speed.

For even more compact size on disk and in memory, also look closely at the order of your columns. Packing small columns close together can yield surprising gains.

Fixed Size Numbers

The smaller the type, the less space the data takes on disk and in memory, which is a big win! At the same time, the smaller the type, the narrower the range of values it can store!

For integer types, smaller types mean smaller ranges:

typesizerange
smallint / int22 bytes-32768 to +32767
integer / int44 bytes-2147483648 to +2147483647
bigint / int88 bytes-9223372036854775808 to 9223372036854775807

Note that there are both SQL standard names for the types, and also PostgreSQL-specific names that are more precise about the internal storage size: an int2 takes two bytes.

If you are storing numbers that are guaranteed to be within a bounded range, then using the smallest type that fits is a no-brainer.

For floating point types, smaller types mean less precision in representation.

IEEE Floating Point

The bits that make up a floating point value in computer internals are used to represent the "sign", the fraction" and the "exponent" -- basically the parts of a number in scientific notation (eg -1.234E10) only in binary.

typesizefactionexponent
real / float44 bytes23 bits8 bits
double precision / float88 bytes52 bits11 bits

The real world precision of a floating point number depends on the magnitude of the exponent. If the exponent is one, then float4 data can be represented with perfect fidelity between the numbers of -2^23 and 2^23 (±8388608). That's a lot of fidelity!

Variable Sized Numbers

Numbers are supposed to go on forever, but the two categories of types we have talked about have both finite ranges and finite precisions. In return, they offer fixed storage size and fast calculation.

What about those who need to potentially exactly represent any number and calculate with them without loss of precision? For those people there is numeric.

The numeric type gets its awesome power by being a "variable length type" (short-handed some times as "varlena"). Other varlena types include text / varchar (can be any length), bytea (can be any length) and the PostGIS geometry and geography types.

The storage requirement for a numeric is two bytes for each group of four decimal digits, plus three to eight bytes overhead. So a minimum of five bytes, even for something as simple as "1". A number like 4 billion, which fits within 4 bytes as an integer takes 9 bytes as a numeric.

Computation also takes longer with numeric values, though it is still exceedingly fast. Let's run a division on and then sum up 10 million numbers:

-- Takes 5 seconds
SELECT sum(a::float8 / (a+1)::float8)
  FROM generate_series(1, 10000000) a;

-- Takes 15 seconds
SELECT sum(a::numeric / (a+1)::numeric)
  FROM generate_series(1, 10000000) a;

Rounding and Representation

People have a very Dr. Jekyll and Mr. Hyde attitude towards precision and calculations. On the one hand, they can be pretty blasé about precision:

SELECT 3.0::float8 * (1.0/5.0);

 0.6000000000000001

"Oh, that's fine, I'll just round everything for display!"

But inevitably the result finds its way into some other process and suddenly people get very angry:

SELECT 3.0::float8 * (1.0/5.0) <= 0.6;

 f

"Why is this stupid database returning the wrong answer for a trivial math expression!"

Harsh reactions about small deviations in calculations and properly rounded representations are particularly acute when the system is dealing with money. Exact math yields exact results.

SELECT 3.0::numeric * (1.0/5.0);

 0.600000000000000000000

For this reason the PostgreSQL documentation explicitly recommends:

If you require exact storage and calculations (such as for monetary amounts), use the numeric type.

The rounding behavior of the numeric type is "away from zero", while the rounding behavior of double precision and float are "towards the nearest even value".

SELECT x,
  round(x::numeric) AS num_round,
  round(x::double precision) AS dbl_round
FROM generate_series(-3.5, 3.5, 1) as x;
  x   | num_round | dbl_round
------+-----------+-----------
 -3.5 |        -4 |        -4
 -2.5 |        -3 |        -2
 -1.5 |        -2 |        -2
 -0.5 |        -1 |        -0
  0.5 |         1 |         0
  1.5 |         2 |         2
  2.5 |         3 |         2
  3.5 |         4 |         4

At the Terminal Prompt

When working at the terminal prompt, it's hard to tell what you're going to get when you type "4.5", but we can see from the rounding behavior that it's a numeric, because it rounds away from zero.

SELECT round(-4.5);

 -5

We have to be explicit about type in order to get a floating point number that rounds towards the even value.

SELECT round(-4.5::float8);

 -4

Conclusions

  • Choosing the right data type can have a big effect on storage overhead! The smallest types can use as little as 25% of the storage used by the largest, for the same values.
  • Choosing the right data type can have a critical effect on correctness! Make sure you know how you are going to be calculating with these values, and what your organizational tolerance for imprecision is.
  • Choosing the right data type can have an effect on performance! Exact math can be many times slower than ordinary calculation, so be prepared to pay a price when using exact types.