Solving N+1 Postgres queries for Ruby on Rails apps
Crunchy Data is getting ready to be at RailsConf 2023 in Atlanta next week and we’ve been thinking about our Rails and ActiveRecord users and customers. One of the easiest ways to improve query performance using an ORM is to lean on as much SQL as you can. I’m going to walk through some of the ActiveRecord basics and how to use some smart SQL to work around N+1 query problems.
The easy CRUD Basics with ActiveRecord
What do I mean by "CRUD"? It's short-hand for create-read-update-delete. For instance, ORMs make it so nice to do any of the following.
Insert a record:
batman = user.create(name: "Batman", email: "batman@wayne-enterprises.com")
Find a record:
user = User.find(batman.id)
Update a record:
user.update(email: "batman@retired.com")
Destroy a record:
user.destroy
ORMs can even manage relationships and joins:
batman.sidekicks = robin
User.find(batman.id).joins(sidekick: :user)
The above would obviously return Robin.
Sometime in the 1970s, superheroes switched from one-to-one hero-to-sidekick ratio to having multiple side-kicks, or functioning as a group. Then, Marvel Universe started introducing groupings of superheroes. The Marvel Universe of superheroes is like teenager group chats -- not every superhero likes every other superhero.
Hulk and Iron Man -- you don't want them in the same room together, unless you have to.
But, I digress.
SQL Superpowers: ON vs. WHERE
This type of grouping relationship necessary for managing superheroes is what ties ORMs into knots. Anytime you want to append a conditional join, they get quite messy.
Below is what I mean when I say conditional join, it is a join, but it conditions with the ON
statement:
SELECT
*
FROM table
LEFT JOIN other_table ON conditional_1 AND conditional_2
This query will return all rows of table, but exclude any other_table
rows where conditional_1
or conditional_2
are false. So, results look something like this:
table.id | other_table.conditional_1 | other_table.conditional_2 |
----------|---------------------------|---------------------------|--
1 | true | true |
2 | | |
If we put the conditional in WHERE
instead of the ON
, and ran this query:
SELECT
*
FROM table
LEFT JOIN other_table ON conditional_1
WHERE conditional_2
Then it only returns results where all conditions are met:
table.id | other_table.conditional_1 | other_table.conditional_2 |
----------|---------------------------|---------------------------|--
1 | true | true |
If you notice, there is only a single row returned. The usage of the WHERE
conditional filters out the entire second row.
So, sometimes, filters need to be in the join’s ON
clause, instead of being in the WHERE
clause.
An ORM in knots
Using Rails’ ActiveRecord ORM, let's return a list of superheroes, then if they are in a chat group owned by Hulk, return those groups as well. We would probably start with something like this:
Users
.left_joins(:group_users => groups)
.where(group_users: {groups: {owner_id: hulk.id}})
This would generate a query that looks something like this:
SELECT
*
FROM users
LEFT OUTER JOIN group_users ON group_users.user_id = users.id
LEFT OUTER JOIN groups ON group_users.group_id = groups.group_id
WHERE
groups.owner_id = [[hulk_user_id]]
This has the problem we defined before: it filters out all rows that do not return true for the conditional. So, it's not returning all users, it's only returning users who belong to a group that is owned by Hulk.
Iron Man would be mad. He'd probably even threaten to take his toys and go home, until someone told him it was just a bug in the software.
A false positive, unless …
With ActiveRecord, there appears to be a way to do this, but it's a false positive. Using SQL fragment runs the query that we want:
users = Users
.joins(:group_users)
.joins(ActiveRecord::Base.sanitize_sql_array("LEFT JOIN groups ON group_users.group_id = groups.id AND groups.owner_id = ?", hulk.id]))
But, when accessing the object's relationships, we get all related rows, not the ones you want (i.e. the conditional join did not stick):
users.first.group_users.groups => all groups, unfiltered
In Rails 6.1, the strict_loading
functionality was added that makes this join behave properly. Run the same ruby code above, and append strict_loading
, and this will prevent additional lazy loading.
users.strict_loading.first.group_users => filtered groups
Should we settle with N + 1?
The typical alternative is to just settle with N + 1 from the controller or the template. It's an attempt to solve data retrieval shortcomings using application level code:
<% users.each do |user| %>
<%= user.name %>
<% user.group_users.includes(:groups).where(group_users: {group_id: params[:group_id]}).each do |group_user|
<%= group_user.group.name %>
<% end %>
<% end %>
Of course, this works … but, it does not scale. It will be fast in development, and it will run fast with small data sets. But, it runs a query for each user record. If the application grows, the loop above will run an additional query for each user displayed.
There is a better way.
Let's just use SQL instead
First, we'll use the quick-and-dirty method. It will call some of the code internals for ActiveRecord.
Let's use ActiveRecord::Base.connection.execute
to run the SQL. We'll also use ActiveRecord::Base.sanitize_sql_array
to securely inject values to safely build the SQL query.
results = ActiveRecord::Base.connection.execute(ActiveRecord::Base.sanitize_sql_array([<<SQL, hulk_user_id]))
SELECT
users.id AS id,
users.name AS name,
groups.name AS group_name
FROM users
LEFT OUTER JOIN group_users ON group_users.user_id = users.id
LEFT OUTER JOIN groups ON group_users.group_id = groups.group_id
AND groups.owner_id = ?
ORDER BY users.name
SQL
Then, in the view, the following code can be used to iterate over the returned values:
<% results.each do |row| %>
<%= row["id"] %>
<%= row["name"] %>
<%= row["group_name"] || "--" %>
<% end %>
Clean it up to make it a little nicer
To clean up the code a bit when running multiple SQL queries, I typically do something like this. I searched for a modern Ruby Gem to handle this type of issue, but none were immediately obvious as being stable and maintained.
Store one-file per query in the
app/models/sql
directory. So the query above would be stored in a file calledapp/models/sql/all_users_and_groups_with_specific_owner.sql
so the above query would look like this:SELECT users.id AS id, users.name AS name, groups.name AS group_name FROM users LEFT OUTER JOIN group_users ON group_users.user_id = users.id LEFT OUTER JOIN groups ON group_users.group_id = groups.group_id AND groups.owner_id = ? ORDER BY users.name
Then, we can have a model that handles these queries for us. Save the following to
app/models/sql.rb
class Sql def self.run(sql_name, *arguments) sql = File.read(File.join(Rails.root, 'app', 'models', 'sql', sql_name + '.sql')) sanitized_sql = ActiveRecord::Base.sanitize_sql_array(sql, *arguments) ActiveRecord::Base.connection.execute(sanitized_sql) end end
Then, when running a SQL command, just do the following:
result = Sql.run("all_users_and_groups_with_specific_owner", hulk_user_id)
Using this method, it puts the SQL query into a space away from the rest of our code. Then, in that SQL file, we can include comments to help our future-selves read the SQL and know why we are using it.
What about database lock-in?
By querying with raw SQL, you will be locked into a database. However, once a raw SQL becomes necessary for performance, it is best to favor database lock-in -- the alternative being slower, generic database interactions.
Once you decide on the database for the long-haul, there is no better database than open-source 100% native Postgres.
Summary
- ActiveRecord is awesome for getting started with databases in your Rails application but performance wise, there can be some limits.
- N+1 queries are a common issue with ActiveRecord or an ORM, and can become more of hindrance as the application scales.
- Writing SQL and embedding that as a model is an easy way to add sql to your application. You’ll be locked into PostgreSQL for the long haul, but that’s ok, there’s no better database for a Rails production application.
See you next week at RailsConf!
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read