The Power of Arel
13 Mar 2016Many modern web applications have at least a few overlapping responsibilities. One of these responsibilities deals with maintaining state. A common choice for storing this state is by way of a relational database. Ruby on Rails applications assume the presence of a database by default and communicate with it via ActiveRecord.
ActiveRecord is an Object Relational Mapping (ORM). Under the hood, ActiveRecord uses a relational algebra library called Arel to compose queries for execution. Arel is a very powerful library readily available for when ActiveRecord falls short.
The Basics
Keeping up with tradition, assume a standard Ruby on Rails application exists with at least two models, User and Post:
create_table "posts" do |t|
t.integer "user_id"
t.string "category"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
create_table "users" do |t|
t.string "first_name"
t.string "last_name"
t.string "email"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
class User < ActiveRecord::Base
has_many :posts
end
class Post < ActiveRecord::Base
belongs_to :user
end
Ask: Find all posts within a category belonging to a user.
For such a simple query, ActiveRecord is capable of generating the appropriate SQL statement:
Post.where(user_id: 123,
category: 'news')
.to_sql
# => SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 123 AND "posts"."category" = 'news'
However, if the query needed to be more complex, this simple where clause is not sufficient. For example, instead of all posts within a category for a user, only the posts which have a created_at in the current month are desired.
A naive ActiveRecord only solution might look like:
beginning_of_month = Date.today
.beginning_of_month
Post.where(user_id: 123,
category: 'news')
.where('created_at >= ?',
beginning_of_month)
.to_sql
# => SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 123 AND "posts"."category" = 'news' AND (created_at >= '2016-03-01' )
This query will give the desired results but uses raw SQL. Using raw SQL is not always a bad practice, but in this case it seems unnecessary and could cause issue later.
Model.arel_table
Every ActiveRecord model can access its underlying Arel representation with the arel_table class method.
Post.arel_table
# => #<Arel::Table:0x00000103ae7f00 @name="posts", ...>
This Arel table exposes each column of the underlying table. These columns provide many available methods used for comparison with either another column or scalar value.
The same Post query from above using its arel_table looks like:
beginning_of_month = Date.today.beginning_of_month
arel = Post.arel_table
Post.where(
arel[:user_id]
.eq(1234)
.and(arel[:category]
.eq('news'))
.and(arel[:created_at]
.gteq(beginning_of_month)))
.to_sql
# => SELECT "posts".* FROM "posts" WHERE ("posts"."user_id" = 1234 AND "posts"."category" = 'news' AND "posts"."created_at" >= '2016-03-01')
After all the necessary conditions are applied to the arel variable, it is given to the Post class via the where method.
With this pure Ruby query generation, something looks a little different. Instead of:
AND (created_at >= '2016-03-01')
The query has changed to:
AND "posts"."created_at" >= '2016-03-01'
This change points to one of the issues of the raw SQL approach: its inflexibility when dealing with multiple tables.
Join Resistance
Iterating on the Post query, it now needs to return the relevant User. If the original raw SQL solution was used the user.created_at column and the posts.created_at column would clash, resulting in an error:
Post.where(user_id: 123,
category: 'news')
.where('created_at >= ? ',
beginning_of_month)
.joins(:user)
.includes(:user)
# => ActiveRecord::StatementInvalid: PG::AmbiguousColumn: ERROR: column reference "created_at" is ambiguous
The problem is clearly seen in the generated SQL:
SELECT "posts".* FROM "posts" INNER JOIN "users" ON "users"."id" = "posts"."user_id" WHERE "posts"."user_id" = 123 AND "posts"."category" = 'news' AND (created_at >= '2016-03-01' )
Since PosgreSQL does not know which created_at column to compare against this date, it can not fulfill the query.
This problem does not exist in the Arel solution:
arel = Post.arel_table
Post.where(
arel[:user_id]
.eq(1234)
.and(arel[:category]
.eq('news'))
.and(arel[:created_at]
.gteq(beginning_of_month)))
.joins(:user)
.includes(:user)
.to_sql
# => SELECT "posts".*, "users.*" FROM "posts" INNER JOIN "users" ON "users"."id" = "posts"."user_id" WHERE ("posts"."user_id" = 1234 AND "posts"."category" = 'news' AND "posts"."created_at" >= '2016-03-01')
Note: The actual generated query has many column aliases in the select clause for the includes behaviour but is irrelevant for these examples.
With each element specifying both table and column names, this query is still pure Ruby and join friendly.
Advanced Selecting
Simple joins and comparison operators are useful but barely begin to explore the vast API contained within Arel. An interesting feature that Arel provides is how to create and utilize Common Table Expressions (CTEs).
A CTE is a temporary named result in an SQL database, derived from a SELECT query. This derived result remains present in the scope for additional SELECT, INSERT, UPDATE, or DELETE queries. CTEs can be thought of as alternatives to temporary tables, sub-queries, views, or other in-line user-defined functions.
Ask: Find all posts written within the last month, then return the email of each user who wrote the post.
Since no Post data is required, this ask is a great candidate for a CTE. A new Arel::Table can be created to house the CTE and then the join method on the User.arel_table makes it accessible:
post_arel = Post.arel_table
user_arel = User.arel_table
common_table_results = begin
post_arel
.project(post_arel[:user_id])
.where(post_arel[:created_at]
.gteq(beginning_of_month))
end
cte_table = Arel::Table.new(
:posts_last_month)
composed_cte = Arel::Nodes::As.new(
cte_table,
common_table_results)
user_arel
.project(users[:email])
.join(cte_table)
.on(users[:id]
.eq(cte_table[:user_id]))
.with(composed_cte)
.to_sql
# => WITH "posts_last_month" AS (SELECT "posts"."user_id" FROM "posts" WHERE "posts"."created_at" >= '2016-03-01') SELECT "users"."email" FROM "users" INNER JOIN "posts_last_month" ON "users"."id" = "posts_last_month"."user_id"
At first glance, this code can be a bit intimidating but it is actually quite simple. Both the Post model and User model have had their underlying Arel representation referenced. Then, the CTE result is computed:
common_table_results = begin
post_arel
.project(post_arel[:user_id])
.where(post_arel[:created_at]
.gteq(beginning_of_month))
end
The most noticeable attribute here is the project method. This method tells the query what to SELECT. Since only user_ids are relevant in this CTE, that is all that is requested.
Next, the actual CTE is created (in Arel representation only, not in the database):
cte_table = Arel::Table.new(
:posts_last_month)
composed_cte = Arel::Nodes::As.new(
cte_table,
common_table_results)
The As class under the Arel::Nodes namespace is created and then fed into the end of the main query via the with method:
user_arel
.project(users[:email])
.join(cte_table)
.on(users[:id]
.eq(cte_table[:user_id]))
.with(composed_cte)
.to_sql
Again, since only the email of the User is desired, the project method is used. The join method is similar to the ActiveRecord method joins but requires the on clause to tell the query how to join the tables. Finally, the with method receives the computed CTE representation and the query is successfully generated.
To actually use this query, another helpful method on ActiveRecord, find_by_sql can accept the full Arel representation, returning the desired User models with only the email column retrieved:
User.find_by_sql(
user_arel
.project(users[:email])
.join(cte_table)
.on(users[:id]
.eq(cte_table[:user_id]))
.with(composed_cte)
)
# => [#<User id: nil, email: "foo@bar.com">]
Why?
Like most tools, Arel may not be applicable in every situation. However, when used appropriately, it can make for some very well maintainable code that is resistant to unexpected failures. When dealing with multiple levels of an application stack, abstracting away complexities around their communication can help vastly simplify a codebase. ActiveRecord and Arel do a great job of insulating application developers from the nitty-gritty of database communication. Where ActiveRecord falls short, Arel is there to pick up the slack.
For extremely specific or complex database interactions, using Arel may not be the correct choice, but should at least be evaluated. Only interacting with the lowest level necessary for a resource has many benefits. An Arel solution to a problem has the potential to be more resilient to future changes.