Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Tutorial Instructions

Indexing (B-Tree Indexes)

B-Tree Sample

We are going to review how to set up simple B-Tree indexes, the most common type of index. We’ve loaded some open weather data with data events by type, damage, time and location. For just a very basic index, I’m going to find everything for winter storms. Once I add this index, this means that to get this data, the database doesn't have to scan all the weather events to get me additional data about severe weather events, it already knows where to look for those.

Starting query

EXPLAIN ANALYZE SELECT * FROM weather WHERE event_type = 'Winter Storm';

Before indexing, what does explain analyze tell us about execution time?

QUERY PLAN                                               
---------------------------------------------------------------------
 Seq Scan on weather  (cost=0.00..103.75 rows=2 width=2210) (actual time=0.400..80.900 rows=11 loops=1)
   Filter: ((event_type)::text = 'Winter Storm'::text)
   Rows Removed by Filter: 1189
 Planning Time: 93.000 ms
 Execution Time: 83.400 ms
(5 rows)

Now create an index

CREATE INDEX idx_weather_type ON weather(event_type);

Now run your query with EXPLAIN again

EXPLAIN ANALYZE SELECT * FROM weather WHERE event_type = 'Winter Storm';
QUERY PLAN                                                        
----------------------------------------------------------------------
 Bitmap Heap Scan on weather  (cost=4.32..23.99 rows=6 width=2210) (actual time=3.700..4.600 rows=11 loops=1)
   Recheck Cond: ((event_type)::text = 'Winter Storm'::text)
   Heap Blocks: exact=6
   ->  Bitmap Index Scan on idx_weather_type  (cost=0.00..4.32 rows=6 width=0) (actual time=3.500..3.500 rows=11 loops=1)
         Index Cond: ((event_type)::text = 'Winter Storm'::text)
 Planning Time: 31.600 ms
 Execution Time: 6.900 ms
(7 rows)

Look at that drop in query execution time, cool!

Multicolumn B-Tree Indexes

Indexes aren't always created for single columns only - Postgres also supports multicolumn indexes. These can be useful if you know that you'll be querying a lot on multiple columns at once.

Drop the first index we made

DROP INDEX idx_weather_type;

Starting query

EXPLAIN ANALYZE SELECT * FROM weather WHERE event_type = 'Winter Storm' AND damage_crops > '0';

Before index explain analyze

QUERY PLAN                                               
---------------------------------------------
 Seq Scan on weather  (cost=0.00..118.00 rows=2 width=2210) (actual time=0.300..61.300 rows=11 loops=1)
   Filter: (((damage_crops)::text > '0'::text) AND ((event_type)::text = 'Winter Storm'::text))
   Rows Removed by Filter: 1189
 Planning Time: 16.700 ms
 Execution Time: 63.400 ms
(5 rows)

Multi-column Index for severity and type

CREATE INDEX idx_storm_crop ON weather(event_type,damage_crops);

After Index Explain Analyze

EXPLAIN ANALYZE SELECT * FROM weather WHERE event_type = 'Winter Storm' AND damage_crops > '0';
QUERY PLAN                                                       
-------------------------------------------------------------------
 Bitmap Heap Scan on weather  (cost=4.30..11.48 rows=2 width=2210) (actual time=4.500..5.000 rows=11 loops=1)
   Recheck Cond: (((event_type)::text = 'Winter Storm'::text) AND ((damage_crops)::text > '0'::text))
   Heap Blocks: exact=6
   ->  Bitmap Index Scan on idx_storm_crop  (cost=0.00..4.30 rows=2 width=0) (actual time=1.000..1.000 rows=11 loops=1)
         Index Cond: (((event_type)::text = 'Winter Storm'::text) AND ((damage_crops)::text > '0'::text))
 Planning Time: 24.900 ms
 Execution Time: 7.300 ms

Reduction in query time again, yay!

If you get lost with all your creating indexes and need to see what you’ve got, this will show you all indexes on a particular table:

SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'weather';

Loading terminal...

Loading terminal...