Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Solving N+1 Postgres queries for Ruby on Rails apps

Avatar for Christopher Winslett

Christopher Winslett

6 min read

Crunchy Data is getting ready to be at RailsConf 2023 in Atlanta next week and we’ve been thinking about our Rails and ActiveRecord users and customers. One of the easiest ways to improve query performance using an ORM is to lean on as much SQL as you can. I’m going to walk through some of the ActiveRecord basics and how to use some smart SQL to work around N+1 query problems.

The easy CRUD Basics with ActiveRecord

What do I mean by "CRUD"? It's short-hand for create-read-update-delete. For instance, ORMs make it so nice to do any of the following.

Insert a record:

batman = user.create(name: "Batman", email: "batman@wayne-enterprises.com")

Find a record:

user = User.find(batman.id)

Update a record:

user.update(email: "batman@retired.com")

Destroy a record:

user.destroy

ORMs can even manage relationships and joins:

batman.sidekicks = robin
User.find(batman.id).joins(sidekick: :user)

The above would obviously return Robin.

Sometime in the 1970s, superheroes switched from one-to-one hero-to-sidekick ratio to having multiple side-kicks, or functioning as a group. Then, Marvel Universe started introducing groupings of superheroes. The Marvel Universe of superheroes is like teenager group chats -- not every superhero likes every other superhero.

Hulk and Iron Man -- you don't want them in the same room together, unless you have to.

But, I digress.

SQL Superpowers: ON vs. WHERE

This type of grouping relationship necessary for managing superheroes is what ties ORMs into knots. Anytime you want to append a conditional join, they get quite messy.

Below is what I mean when I say conditional join, it is a join, but it conditions with the ON statement:

SELECT
	*
FROM table
LEFT JOIN other_table ON conditional_1 AND conditional_2

This query will return all rows of table, but exclude any other_table rows where conditional_1 or conditional_2 are false. So, results look something like this:

 table.id | other_table.conditional_1 | other_table.conditional_2 |
----------|---------------------------|---------------------------|--
        1 |                      true |                      true |
        2 |                           |                           |

If we put the conditional in WHERE instead of the ON, and ran this query:

SELECT
	*
FROM table
LEFT JOIN other_table ON conditional_1
WHERE conditional_2

Then it only returns results where all conditions are met:

 table.id | other_table.conditional_1 | other_table.conditional_2 |
----------|---------------------------|---------------------------|--
        1 |                      true |                      true |

If you notice, there is only a single row returned. The usage of the WHERE conditional filters out the entire second row.

So, sometimes, filters need to be in the join’s ON clause, instead of being in the WHERE clause.

An ORM in knots

Using Rails’ ActiveRecord ORM, let's return a list of superheroes, then if they are in a chat group owned by Hulk, return those groups as well. We would probably start with something like this:

Users
	.left_joins(:group_users => groups)
	.where(group_users: {groups: {owner_id: hulk.id}})

This would generate a query that looks something like this:

SELECT
	*
FROM users
LEFT OUTER JOIN group_users ON group_users.user_id = users.id
LEFT OUTER JOIN groups ON group_users.group_id = groups.group_id
WHERE
	groups.owner_id = [[hulk_user_id]]

This has the problem we defined before: it filters out all rows that do not return true for the conditional. So, it's not returning all users, it's only returning users who belong to a group that is owned by Hulk.

Iron Man would be mad. He'd probably even threaten to take his toys and go home, until someone told him it was just a bug in the software.

A false positive, unless …

With ActiveRecord, there appears to be a way to do this, but it's a false positive. Using SQL fragment runs the query that we want:

users = Users
	.joins(:group_users)
	.joins(ActiveRecord::Base.sanitize_sql_array("LEFT JOIN groups ON group_users.group_id = groups.id AND groups.owner_id = ?", hulk.id]))

But, when accessing the object's relationships, we get all related rows, not the ones you want (i.e. the conditional join did not stick):

users.first.group_users.groups => all groups, unfiltered

In Rails 6.1, the strict_loading functionality was added that makes this join behave properly. Run the same ruby code above, and append strict_loading, and this will prevent additional lazy loading.

users.strict_loading.first.group_users => filtered groups

Should we settle with N + 1?

The typical alternative is to just settle with N + 1 from the controller or the template. It's an attempt to solve data retrieval shortcomings using application level code:

<% users.each do |user| %>
	<%= user.name %>
  <% user.group_users.includes(:groups).where(group_users: {group_id: params[:group_id]}).each do |group_user|
    <%= group_user.group.name %>
  <% end %>
<% end %>

Of course, this works … but, it does not scale. It will be fast in development, and it will run fast with small data sets. But, it runs a query for each user record. If the application grows, the loop above will run an additional query for each user displayed.

There is a better way.

Let's just use SQL instead

First, we'll use the quick-and-dirty method. It will call some of the code internals for ActiveRecord.

Let's use ActiveRecord::Base.connection.execute to run the SQL. We'll also use ActiveRecord::Base.sanitize_sql_array to securely inject values to safely build the SQL query.

results = ActiveRecord::Base.connection.execute(ActiveRecord::Base.sanitize_sql_array([<<SQL, hulk_user_id]))
SELECT
	users.id AS id,
	users.name AS name,
	groups.name AS group_name
FROM users
LEFT OUTER JOIN group_users ON group_users.user_id = users.id
LEFT OUTER JOIN groups ON group_users.group_id = groups.group_id
	AND	groups.owner_id = ?
ORDER BY users.name
SQL

Then, in the view, the following code can be used to iterate over the returned values:

<% results.each do |row| %>
  <%= row["id"] %>
  <%= row["name"] %>
  <%= row["group_name"] || "--" %>
<% end %>

Clean it up to make it a little nicer

To clean up the code a bit when running multiple SQL queries, I typically do something like this. I searched for a modern Ruby Gem to handle this type of issue, but none were immediately obvious as being stable and maintained.

  1. Store one-file per query in the app/models/sql directory. So the query above would be stored in a file called app/models/sql/all_users_and_groups_with_specific_owner.sql so the above query would look like this:

    SELECT
    	users.id AS id,
    	users.name AS name,
    	groups.name AS group_name
    FROM users
    LEFT OUTER JOIN group_users ON group_users.user_id = users.id
    LEFT OUTER JOIN groups ON group_users.group_id = groups.group_id
    	AND	groups.owner_id = ?
    ORDER BY users.name
    
  2. Then, we can have a model that handles these queries for us. Save the following to app/models/sql.rb

    class Sql
    	def self.run(sql_name, *arguments)
    		sql = File.read(File.join(Rails.root, 'app', 'models', 'sql', sql_name + '.sql'))
    		sanitized_sql = ActiveRecord::Base.sanitize_sql_array(sql, *arguments)
    		ActiveRecord::Base.connection.execute(sanitized_sql)
    	end
    end
    
  3. Then, when running a SQL command, just do the following:

    result = Sql.run("all_users_and_groups_with_specific_owner", hulk_user_id)
    

Using this method, it puts the SQL query into a space away from the rest of our code. Then, in that SQL file, we can include comments to help our future-selves read the SQL and know why we are using it.

What about database lock-in?

By querying with raw SQL, you will be locked into a database. However, once a raw SQL becomes necessary for performance, it is best to favor database lock-in -- the alternative being slower, generic database interactions.

Once you decide on the database for the long-haul, there is no better database than open-source 100% native Postgres.

Summary

  • ActiveRecord is awesome for getting started with databases in your Rails application but performance wise, there can be some limits.
  • N+1 queries are a common issue with ActiveRecord or an ORM, and can become more of hindrance as the application scales.
  • Writing SQL and embedding that as a model is an easy way to add sql to your application. You’ll be locked into PostgreSQL for the long haul, but that’s ok, there’s no better database for a Rails production application.

See you next week at RailsConf!