Postgres Goodies in Ruby on Rails 7.1

I just spent last week at Rails World in Amsterdam and had a blast digging back into the Rails and Active Record world. In conversations with developers over the week, I had some notable takeaways from the newest version of Ruby on Rails that I just had to get written up.

A quick summary before we dig in:

async queries: send long-running queries the background while the code runs along, great for pages with multiple long-running queries that can be run in parallel
composite primary keys: native support for using two or more columns as a primary key
common table expression (CTEs): native integration for a subquery for use later in the statement
unlogged tables: native support for disabling Postgres’ WAL logs on a table (mostly for test environments), so that you get better performance on your tests that use databases
value normalization: a native, universal syntax for normalization of values (like downcase of a email column) instead of using before_validation

Expansion of Async queries

A feature (not a bug IMO) of Ruby is that it has traditionally been used as a blocking (i.e. not-asynchronous) language. While it does have asynchronous capabilities, in the typical use-case people do not have to grok asynchronous workflows to use it effectively.

In the 7.0 release, Active Record added load_async for loading whole objects. In 7.1, asynchronous queries have been enabled for aggregations and in full SQL queries using async_find_by_sql.

First, you'll need to define the async_query_executor in your environment files (config/environments/{development, production}.rb).

To use a global setting, use something like the following:

config.active_record.async_query_executor = :global_thread_pool
config.active_record.global_executor_concurrency = 5

To use a per-database setting, use something like the following and min_threads + max_threads the database.yml:

config/environments/{development, production}.rb

config.active_record.async_query_executor = :multi_thread_pool

config/database.yml

development:
  adapter: postgresql
  pool: 5
  max_threads: 5
  min_threads: 5

After setting one of those configurations, we can see how the queries work asynchronously:

irb> u = User.async_find_by_sql("SELECT *, pg_sleep(3) FROM users") # this query will sleep for 3 seconds for each record in your database
=> #<ActiveRecord::Promise status=pending>

Then, sometime later you can use the value syntax to retrieve the value:

# … sometime later …
irb> u.value
=> {returned results}

Once you call the .value method, if the query has returned, it will return instantly. If the query has not returned, processing is blocked until the query is complete. This is great for dashboards, charts, and reports that generate more complex queries. Send the query to the background, and let it process while you complete other elements of the request.

Composite primary keys

Composite primary keys have been noticeably absent from Ruby on Rails for a while -- unless you chose the CPK gem. Rails 7.1 added two native methods for implementing Primary Keys: database level and application level.

Database Level Composite Primary Keys: When defining the table in the database migration, you can pass an array of column names to the primary_key attribute. For databases capable of composite primary keys (like Postgres), Active Record will infer from the schema:

    create_table :user_accounts, primary_key: [:user_id, :account_id] do |t|
      t.belongs_to(:user, foreign_key: true)
      t.belongs_to(:account, foreign_key: true)

      t.string :role

      t.timestamps
    end

After running this migration, if you run \d user_accounts on your Postgres database, you’ll see the following line for the composite key:

"user_accounts_pkey" PRIMARY KEY, btree (user_id, account_id)

Then, when querying with the composite keys, you do the following:

UserAccounts.find([1, 2]) # where user_id = 1 and account_id = 2

In the hypothetical use-case above, we use the composite primary keys for a join table between users and accounts. Typically, in the past I would use an id column with a unique constraint on the user_id and account_id values. Now, with Postgres, we can use the composite primary key for the row

Application Level Composite Primary Keys: Rails documentation calls this a "virtual primary key". And, you’ll want to know that it’s a bit more restrictive than the composite primary keys above because it enforces an explicit foreign key definition on relationships:

class UserAccounts < ActiveRecord::Base
  query_constraints :user_id, :account_id

  belongs_to :user, foreign_key: :user_id
  belongs_to :account, foreign_key: :account_id
end

You have to be explicit on the foreign_key definitions of the relationship, else it tries to find user_id and account_id on every related model.

My recommendation is to use Postgres and the native composite primary keys in a database.

Native Support for CTEs

A "CTE" is a "Common Table Expression". A CTE is a type of a nested SQL statement that is defined before the SQL. Below is an example using native-SQL that would find the latest event for each user on an account:

WITH latest_event_per_user AS (
	SELECT
		user_id,
		MAX(event_logs.created_at) AS last_created_at
	FROM event_logs
	WHERE
		event_logs.account_id = 1
	GROUP BY 1
)

SELECT
	event_logs.user_id,
	event_logs.name,
	event_logs.created_at
FROM event_logs
	INNER JOIN latest_event_per_user ON
		event_logs.user_id = latest_event_per_user.user_id AND
		event_logs.created_at = latest_event_per_user.last_created_at;

When would you use something like this? Above is a query that returns the latest events for each user on an account. Another common usage of CTEs is when generating the data for charts and reports. Most of the time, generating this data at the SQL level is much faster than generating it using application level logic. Application logic would require some type of N + 1 query, which can be avoided with a more expressive SQL query.

To support CTEs, Active Record added .with() method for chaining queries. The with() accepts an Object, which is quite nice when using a large block. Writing the same query from above in Active Record would look like the following:

latest_event_per_user = EventLog.
	where(account_id: 1).
	group(:user_id).
	select(:user_id, "max(event_logs.created_at) AS last_created_at")

el = EventLog.
	with(my_cte: latest_event_per_user).
	joins("JOIN my_cte ON event_logs.user_id = my_cte.user_id AND event_logs.created_at = my_cte.last_created_at")

My recommendation: if you are SQL-nerd enough to use CTEs, consider using ActiveRecord::Base.connection.execute() to execute the raw sql. The one time I could see using this Active Record CTE syntax is if you need the chaining capabilities due to conditional query creation.

Support for unlogged tables (test env only!)

Postgres in test environments do not need the persistence that Postgres needs in production. Enter UNLOGGED TABLE, which does not apply Postgres durability of a table, but improves performance:

# config/environments/test.rb

ActiveSupport.on_load(:active_record_postgresqladapter) do
  self.create_unlogged_tables = true
end

ActiveRecord::Base.normalizes

Most modern databases (including Postgres) are case-sensitive by default. And, users have been known to randomly capitalize values. So, it's best to sanitize values. If you've written a Rails application, you've probably written something like the following:

class User < ApplicationRecord
	before_validation do
	  self.email = self.email.strip.downcase
	end
end

Now, we have an native way to do this with normalizes:

class User < ApplicationRecord
	normalizes :email, with: -> given_value { given_value.strip.downcase }
end

For those who don't know about the -> syntax, this is a function with a single argument called given_value.

Summary

The PostgreSQL, Active Record, and Ruby on Rails communities continue to show investment in features to make this a strong stack for data heavy production applications.

Latest Articles