Postgres Goodies in Ruby on Rails 7.1
I just spent last week at Rails World in Amsterdam and had a blast digging back into the Rails and Active Record world. In conversations with developers over the week, I had some notable takeaways from the newest version of Ruby on Rails that I just had to get written up.
A quick summary before we dig in:
async queries: send long-running queries the background while the code runs along, great for pages with multiple long-running queries that can be run in parallel
composite primary keys: native support for using two or more columns as a primary key
common table expression (CTEs): native integration for a subquery for use later in the statement
unlogged tables: native support for disabling Postgres’ WAL logs on a table (mostly for test environments), so that you get better performance on your tests that use databases
value normalization: a native, universal syntax for normalization of values (like downcase of a email column) instead of using
before_validation
Expansion of Async queries
A feature (not a bug IMO) of Ruby is that it has traditionally been used as a blocking (i.e. not-asynchronous) language. While it does have asynchronous capabilities, in the typical use-case people do not have to grok asynchronous workflows to use it effectively.
In the 7.0 release, Active Record added load_async
for loading whole objects. In 7.1, asynchronous queries have been enabled for aggregations and in full SQL queries using async_find_by_sql
.
First, you'll need to define the async_query_executor
in your environment files (config/environments/{development, production}.rb
).
To use a global setting, use something like the following:
config.active_record.async_query_executor = :global_thread_pool
config.active_record.global_executor_concurrency = 5
To use a per-database setting, use something like the following and min_threads
+ max_threads
the database.yml
:
config/environments/{development, production}.rb
config.active_record.async_query_executor = :multi_thread_pool
config/database.yml
development:
adapter: postgresql
pool: 5
max_threads: 5
min_threads: 5
After setting one of those configurations, we can see how the queries work asynchronously:
irb> u = User.async_find_by_sql("SELECT *, pg_sleep(3) FROM users") # this query will sleep for 3 seconds for each record in your database
=> #<ActiveRecord::Promise status=pending>
Then, sometime later you can use the value
syntax to retrieve the value:
# … sometime later …
irb> u.value
=> {returned results}
Once you call the .value
method, if the query has returned, it will return instantly. If the query has not returned, processing is blocked until the query is complete. This is great for dashboards, charts, and reports that generate more complex queries. Send the query to the background, and let it process while you complete other elements of the request.
Composite primary keys
Composite primary keys have been noticeably absent from Ruby on Rails for a while -- unless you chose the CPK gem. Rails 7.1 added two native methods for implementing Primary Keys: database level and application level.
Database Level Composite Primary Keys: When defining the table in the database migration, you can pass an array of column names to the primary_key
attribute. For databases capable of composite primary keys (like Postgres), Active Record will infer from the schema:
create_table :user_accounts, primary_key: [:user_id, :account_id] do |t|
t.belongs_to(:user, foreign_key: true)
t.belongs_to(:account, foreign_key: true)
t.string :role
t.timestamps
end
After running this migration, if you run \d user_accounts
on your Postgres database, you’ll see the following line for the composite key:
"user_accounts_pkey" PRIMARY KEY, btree (user_id, account_id)
Then, when querying with the composite keys, you do the following:
UserAccounts.find([1, 2]) # where user_id = 1 and account_id = 2
In the hypothetical use-case above, we use the composite primary keys for a join table between users and accounts. Typically, in the past I would use an id
column with a unique constraint on the user_id
and account_id
values. Now, with Postgres, we can use the composite primary key for the row
Application Level Composite Primary Keys: Rails documentation calls this a "virtual primary key". And, you’ll want to know that it’s a bit more restrictive than the composite primary keys above because it enforces an explicit foreign key definition on relationships:
class UserAccounts < ActiveRecord::Base
query_constraints :user_id, :account_id
belongs_to :user, foreign_key: :user_id
belongs_to :account, foreign_key: :account_id
end
You have to be explicit on the foreign_key definitions of the relationship, else it tries to find user_id
and account_id
on every related model.
My recommendation is to use Postgres and the native composite primary keys in a database.
Native Support for CTEs
A "CTE" is a "Common Table Expression". A CTE is a type of a nested SQL statement that is defined before the SQL. Below is an example using native-SQL that would find the latest event for each user on an account:
WITH latest_event_per_user AS (
SELECT
user_id,
MAX(event_logs.created_at) AS last_created_at
FROM event_logs
WHERE
event_logs.account_id = 1
GROUP BY 1
)
SELECT
event_logs.user_id,
event_logs.name,
event_logs.created_at
FROM event_logs
INNER JOIN latest_event_per_user ON
event_logs.user_id = latest_event_per_user.user_id AND
event_logs.created_at = latest_event_per_user.last_created_at;
When would you use something like this? Above is a query that returns the latest events for each user on an account. Another common usage of CTEs is when generating the data for charts and reports. Most of the time, generating this data at the SQL level is much faster than generating it using application level logic. Application logic would require some type of N + 1 query, which can be avoided with a more expressive SQL query.
To support CTEs, Active Record added .with()
method for chaining queries. The with()
accepts an Object, which is quite nice when using a large block. Writing the same query from above in Active Record would look like the following:
latest_event_per_user = EventLog.
where(account_id: 1).
group(:user_id).
select(:user_id, "max(event_logs.created_at) AS last_created_at")
el = EventLog.
with(my_cte: latest_event_per_user).
joins("JOIN my_cte ON event_logs.user_id = my_cte.user_id AND event_logs.created_at = my_cte.last_created_at")
My recommendation: if you are SQL-nerd enough to use CTEs, consider using ActiveRecord::Base.connection.execute()
to execute the raw sql. The one time I could see using this Active Record CTE syntax is if you need the chaining capabilities due to conditional query creation.
Support for unlogged tables (test env only!)
Postgres in test environments do not need the persistence that Postgres needs in production. Enter UNLOGGED TABLE
, which does not apply Postgres durability of a table, but improves performance:
# config/environments/test.rb
ActiveSupport.on_load(:active_record_postgresqladapter) do
self.create_unlogged_tables = true
end
ActiveRecord::Base.normalizes
Most modern databases (including Postgres) are case-sensitive by default. And, users have been known to randomly capitalize values. So, it's best to sanitize values. If you've written a Rails application, you've probably written something like the following:
class User < ApplicationRecord
before_validation do
self.email = self.email.strip.downcase
end
end
Now, we have an native way to do this with normalizes
:
class User < ApplicationRecord
normalizes :email, with: -> given_value { given_value.strip.downcase }
end
For those who don't know about the ->
syntax, this is a function with a single argument called given_value
.
Summary
The PostgreSQL, Active Record, and Ruby on Rails communities continue to show investment in features to make this a strong stack for data heavy production applications.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read