The Power of Arel
13 Mar 2016Many modern web applications have at least a few overlapping responsibilities. One of these responsibilities deals with maintaining state. A common choice for storing this state is by way of a relational database. Ruby on Rails applications assume the presence of a database by default and communicate with it via ActiveRecord
.
ActiveRecord
is an Object Relational Mapping (ORM). Under the hood, ActiveRecord
uses a relational algebra library called Arel
to compose queries for execution. Arel
is a very powerful library readily available for when ActiveRecord
falls short.
The Basics
Keeping up with tradition, assume a standard Ruby on Rails application exists with at least two models, User
and Post
:
create_table "posts" do |t|
t.integer "user_id"
t.string "category"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
create_table "users" do |t|
t.string "first_name"
t.string "last_name"
t.string "email"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
class User < ActiveRecord::Base
has_many :posts
end
class Post < ActiveRecord::Base
belongs_to :user
end
Ask: Find all posts within a category belonging to a user.
For such a simple query, ActiveRecord
is capable of generating the appropriate SQL statement:
Post.where(user_id: 123,
category: 'news')
.to_sql
# => SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 123 AND "posts"."category" = 'news'
However, if the query needed to be more complex, this simple where
clause is not sufficient. For example, instead of all posts
within a category
for a user
, only the posts
which have a created_at
in the current month are desired.
A naive ActiveRecord
only solution might look like:
beginning_of_month = Date.today
.beginning_of_month
Post.where(user_id: 123,
category: 'news')
.where('created_at >= ?',
beginning_of_month)
.to_sql
# => SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 123 AND "posts"."category" = 'news' AND (created_at >= '2016-03-01' )
This query will give the desired results but uses raw SQL. Using raw SQL is not always a bad practice, but in this case it seems unnecessary and could cause issue later.
Model.arel_table
Every ActiveRecord
model can access its underlying Arel
representation with the arel_table
class method.
Post.arel_table
# => #<Arel::Table:0x00000103ae7f00 @name="posts", ...>
This Arel
table exposes each column of the underlying table. These columns provide many available methods used for comparison with either another column or scalar value.
The same Post
query from above using its arel_table
looks like:
beginning_of_month = Date.today.beginning_of_month
arel = Post.arel_table
Post.where(
arel[:user_id]
.eq(1234)
.and(arel[:category]
.eq('news'))
.and(arel[:created_at]
.gteq(beginning_of_month)))
.to_sql
# => SELECT "posts".* FROM "posts" WHERE ("posts"."user_id" = 1234 AND "posts"."category" = 'news' AND "posts"."created_at" >= '2016-03-01')
After all the necessary conditions are applied to the arel
variable, it is given to the Post
class via the where
method.
With this pure Ruby query generation, something looks a little different. Instead of:
AND (created_at >= '2016-03-01')
The query has changed to:
AND "posts"."created_at" >= '2016-03-01'
This change points to one of the issues of the raw SQL approach: its inflexibility when dealing with multiple tables.
Join Resistance
Iterating on the Post
query, it now needs to return the relevant User
. If the original raw SQL solution was used the user.created_at
column and the posts.created_at
column would clash, resulting in an error:
Post.where(user_id: 123,
category: 'news')
.where('created_at >= ? ',
beginning_of_month)
.joins(:user)
.includes(:user)
# => ActiveRecord::StatementInvalid: PG::AmbiguousColumn: ERROR: column reference "created_at" is ambiguous
The problem is clearly seen in the generated SQL:
SELECT "posts".* FROM "posts" INNER JOIN "users" ON "users"."id" = "posts"."user_id" WHERE "posts"."user_id" = 123 AND "posts"."category" = 'news' AND (created_at >= '2016-03-01' )
Since PosgreSQL does not know which created_at
column to compare against this date, it can not fulfill the query.
This problem does not exist in the Arel
solution:
arel = Post.arel_table
Post.where(
arel[:user_id]
.eq(1234)
.and(arel[:category]
.eq('news'))
.and(arel[:created_at]
.gteq(beginning_of_month)))
.joins(:user)
.includes(:user)
.to_sql
# => SELECT "posts".*, "users.*" FROM "posts" INNER JOIN "users" ON "users"."id" = "posts"."user_id" WHERE ("posts"."user_id" = 1234 AND "posts"."category" = 'news' AND "posts"."created_at" >= '2016-03-01')
Note: The actual generated query has many column aliases in the select clause for the includes
behaviour but is irrelevant for these examples.
With each element specifying both table and column names, this query is still pure Ruby and join
friendly.
Advanced Selecting
Simple joins and comparison operators are useful but barely begin to explore the vast API contained within Arel
. An interesting feature that Arel
provides is how to create and utilize Common Table Expressions (CTEs).
A CTE is a temporary named result in an SQL database, derived from a SELECT
query. This derived result remains present in the scope for additional SELECT
, INSERT
, UPDATE
, or DELETE
queries. CTEs can be thought of as alternatives to temporary tables, sub-queries, views, or other in-line user-defined functions.
Ask: Find all posts written within the last month, then return the email of each user who wrote the post.
Since no Post
data is required, this ask is a great candidate for a CTE. A new Arel::Table
can be created to house the CTE and then the join
method on the User.arel_table
makes it accessible:
post_arel = Post.arel_table
user_arel = User.arel_table
common_table_results = begin
post_arel
.project(post_arel[:user_id])
.where(post_arel[:created_at]
.gteq(beginning_of_month))
end
cte_table = Arel::Table.new(
:posts_last_month)
composed_cte = Arel::Nodes::As.new(
cte_table,
common_table_results)
user_arel
.project(users[:email])
.join(cte_table)
.on(users[:id]
.eq(cte_table[:user_id]))
.with(composed_cte)
.to_sql
# => WITH "posts_last_month" AS (SELECT "posts"."user_id" FROM "posts" WHERE "posts"."created_at" >= '2016-03-01') SELECT "users"."email" FROM "users" INNER JOIN "posts_last_month" ON "users"."id" = "posts_last_month"."user_id"
At first glance, this code can be a bit intimidating but it is actually quite simple. Both the Post
model and User
model have had their underlying Arel
representation referenced. Then, the CTE result is computed:
common_table_results = begin
post_arel
.project(post_arel[:user_id])
.where(post_arel[:created_at]
.gteq(beginning_of_month))
end
The most noticeable attribute here is the project
method. This method tells the query what to SELECT
. Since only user_ids
are relevant in this CTE, that is all that is requested.
Next, the actual CTE is created (in Arel
representation only, not in the database):
cte_table = Arel::Table.new(
:posts_last_month)
composed_cte = Arel::Nodes::As.new(
cte_table,
common_table_results)
The As
class under the Arel::Nodes
namespace is created and then fed into the end of the main query via the with
method:
user_arel
.project(users[:email])
.join(cte_table)
.on(users[:id]
.eq(cte_table[:user_id]))
.with(composed_cte)
.to_sql
Again, since only the email
of the User
is desired, the project
method is used. The join
method is similar to the ActiveRecord
method joins
but requires the on
clause to tell the query how to join the tables. Finally, the with
method receives the computed CTE representation and the query is successfully generated.
To actually use this query, another helpful method on ActiveRecord
, find_by_sql
can accept the full Arel
representation, returning the desired User
models with only the email
column retrieved:
User.find_by_sql(
user_arel
.project(users[:email])
.join(cte_table)
.on(users[:id]
.eq(cte_table[:user_id]))
.with(composed_cte)
)
# => [#<User id: nil, email: "foo@bar.com">]
Why?
Like most tools, Arel
may not be applicable in every situation. However, when used appropriately, it can make for some very well maintainable code that is resistant to unexpected failures. When dealing with multiple levels of an application stack, abstracting away complexities around their communication can help vastly simplify a codebase. ActiveRecord
and Arel
do a great job of insulating application developers from the nitty-gritty of database communication. Where ActiveRecord
falls short, Arel
is there to pick up the slack.
For extremely specific or complex database interactions, using Arel
may not be the correct choice, but should at least be evaluated. Only interacting with the lowest level necessary for a resource has many benefits. An Arel
solution to a problem has the potential to be more resilient to future changes.