Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
We are going to review how to set up simple B-Tree indexes, the most common type of index. We’ve loaded some open weather data with data events by type, damage, time and location. For just a very basic index, I’m going to find everything for winter storms. Once I add this index, this means that to get this data, the database doesn't have to scan all the weather events to get me additional data about severe weather events, it already knows where to look for those.
Starting query
EXPLAIN ANALYZE
SELECT *
FROM weather
WHERE event_type = 'Winter Storm';
Before indexing, what does explain analyze tell us about execution time?
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on weather (cost=0.00..103.75 rows=2 width=2210) (actual time=0.400..80.900 rows=11 loops=1)
Filter: ((event_type)::text = 'Winter Storm'::text)
Rows Removed by Filter: 1189
Planning Time: 93.000 ms
Execution Time: 83.400 ms
(5 rows)
Now create an index
CREATE INDEX idx_weather_type ON weather(event_type);
Now run your query with EXPLAIN again
EXPLAIN ANALYZE
SELECT *
FROM weather
WHERE event_type = 'Winter Storm';
QUERY PLAN
----------------------------------------------------------------------
Bitmap Heap Scan on weather (cost=4.32..23.99 rows=6 width=2210) (actual time=3.700..4.600 rows=11 loops=1)
Recheck Cond: ((event_type)::text = 'Winter Storm'::text)
Heap Blocks: exact=6
-> Bitmap Index Scan on idx_weather_type (cost=0.00..4.32 rows=6 width=0) (actual time=3.500..3.500 rows=11 loops=1)
Index Cond: ((event_type)::text = 'Winter Storm'::text)
Planning Time: 31.600 ms
Execution Time: 6.900 ms
(7 rows)
Look at that drop in query execution time, cool!
Indexes aren't always created for single columns only - Postgres also supports multicolumn indexes. These can be useful if you know that you'll be querying a lot on multiple columns at once.
Drop the first index we made
DROP INDEX idx_weather_type;
Starting query
EXPLAIN ANALYZE
SELECT *
FROM weather
WHERE event_type = 'Winter Storm'
AND damage_crops > '0';
Before index explain analyze
QUERY PLAN
---------------------------------------------
Seq Scan on weather (cost=0.00..118.00 rows=2 width=2210) (actual time=0.300..61.300 rows=11 loops=1)
Filter: (((damage_crops)::text > '0'::text) AND ((event_type)::text = 'Winter Storm'::text))
Rows Removed by Filter: 1189
Planning Time: 16.700 ms
Execution Time: 63.400 ms
(5 rows)
Multi-column Index for severity and type
CREATE INDEX idx_storm_crop ON weather(event_type,damage_crops);
After Index Explain Analyze
EXPLAIN ANALYZE
SELECT *
FROM weather
WHERE event_type = 'Winter Storm'
AND damage_crops > '0';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on weather (cost=4.30..11.48 rows=2 width=2210) (actual time=4.500..5.000 rows=11 loops=1)
Recheck Cond: (((event_type)::text = 'Winter Storm'::text) AND ((damage_crops)::text > '0'::text))
Heap Blocks: exact=6
-> Bitmap Index Scan on idx_storm_crop (cost=0.00..4.30 rows=2 width=0) (actual time=1.000..1.000 rows=11 loops=1)
Index Cond: (((event_type)::text = 'Winter Storm'::text) AND ((damage_crops)::text > '0'::text))
Planning Time: 24.900 ms
Execution Time: 7.300 ms
Reduction in query time again, yay!
If you get lost with all your creating indexes and need to see what you’ve got, this will show you all indexes on a particular table:
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'weather';
Loading terminal...
Loading terminal...