The Power of Arel

Many modern web applications have at least a few overlapping responsibilities. One of these responsibilities deals with maintaining state. A common choice for storing this state is by way of a relational database. Ruby on Rails applications assume the presence of a database by default and communicate with it via ActiveRecord.

ActiveRecord is an Object Relational Mapping (ORM). Under the hood, ActiveRecord uses a relational algebra library called Arel to compose queries for execution. Arel is a very powerful library readily available for when ActiveRecord falls short.

The Basics

Keeping up with tradition, assume a standard Ruby on Rails application exists with at least two models, User and Post:

create_table "posts" do |t|
  t.integer  "user_id"
  t.string   "category"
  t.datetime "created_at", null: false
  t.datetime "updated_at", null: false
end

create_table "users" do |t|
  t.string   "first_name"
  t.string   "last_name"
  t.string   "email"
  t.datetime "created_at", null: false
  t.datetime "updated_at", null: false
end

class User < ActiveRecord::Base
  has_many :posts
end

class Post < ActiveRecord::Base
  belongs_to :user
end

Ask: Find all posts within a category belonging to a user.

For such a simple query, ActiveRecord is capable of generating the appropriate SQL statement:

Post.where(user_id: 123,
           category: 'news')
    .to_sql

# => SELECT "posts".* FROM "posts"  WHERE "posts"."user_id" = 123 AND "posts"."category" = 'news'

However, if the query needed to be more complex, this simple where clause is not sufficient. For example, instead of all posts within a category for a user, only the posts which have a created_at in the current month are desired.

A naive ActiveRecord only solution might look like:

beginning_of_month = Date.today
                         .beginning_of_month

Post.where(user_id: 123,
           category: 'news')
    .where('created_at >= ?',
           beginning_of_month)
    .to_sql

# => SELECT "posts".* FROM "posts"  WHERE "posts"."user_id" = 123 AND "posts"."category" = 'news' AND (created_at >= '2016-03-01' )

This query will give the desired results but uses raw SQL. Using raw SQL is not always a bad practice, but in this case it seems unnecessary and could cause issue later.

Model.arel_table

Every ActiveRecord model can access its underlying Arel representation with the arel_table class method.

Post.arel_table

# => #<Arel::Table:0x00000103ae7f00 @name="posts", ...>

This Arel table exposes each column of the underlying table. These columns provide many available methods used for comparison with either another column or scalar value.

The same Post query from above using its arel_table looks like:

beginning_of_month = Date.today.beginning_of_month
arel = Post.arel_table

Post.where(
      arel[:user_id]
        .eq(1234)
        .and(arel[:category]
             .eq('news'))
        .and(arel[:created_at]
             .gteq(beginning_of_month)))
    .to_sql

# => SELECT "posts".* FROM "posts"  WHERE ("posts"."user_id" = 1234 AND "posts"."category" = 'news' AND "posts"."created_at" >= '2016-03-01')

After all the necessary conditions are applied to the arel variable, it is given to the Post class via the where method.

With this pure Ruby query generation, something looks a little different. Instead of:

AND (created_at >= '2016-03-01')

The query has changed to:

AND "posts"."created_at" >= '2016-03-01'

This change points to one of the issues of the raw SQL approach: its inflexibility when dealing with multiple tables.

Join Resistance

Iterating on the Post query, it now needs to return the relevant User. If the original raw SQL solution was used the user.created_at column and the posts.created_at column would clash, resulting in an error:

Post.where(user_id: 123,
           category: 'news')
    .where('created_at >= ? ',
           beginning_of_month)
    .joins(:user)
    .includes(:user)

# => ActiveRecord::StatementInvalid: PG::AmbiguousColumn: ERROR:  column reference "created_at" is ambiguous

The problem is clearly seen in the generated SQL:

SELECT "posts".* FROM "posts" INNER JOIN "users" ON "users"."id" = "posts"."user_id" WHERE "posts"."user_id" = 123 AND "posts"."category" = 'news' AND (created_at >= '2016-03-01' )

Since PosgreSQL does not know which created_at column to compare against this date, it can not fulfill the query.

This problem does not exist in the Arel solution:

arel = Post.arel_table
Post.where(
      arel[:user_id]
        .eq(1234)
        .and(arel[:category]
              .eq('news'))
        .and(arel[:created_at]
              .gteq(beginning_of_month)))
    .joins(:user)
    .includes(:user)
    .to_sql

# => SELECT "posts".*, "users.*" FROM "posts" INNER JOIN "users" ON "users"."id" = "posts"."user_id" WHERE ("posts"."user_id" = 1234 AND "posts"."category" = 'news' AND "posts"."created_at" >= '2016-03-01')

Note: The actual generated query has many column aliases in the select clause for the includes behaviour but is irrelevant for these examples.

With each element specifying both table and column names, this query is still pure Ruby and join friendly.

Advanced Selecting

Simple joins and comparison operators are useful but barely begin to explore the vast API contained within Arel. An interesting feature that Arel provides is how to create and utilize Common Table Expressions (CTEs).

A CTE is a temporary named result in an SQL database, derived from a SELECT query. This derived result remains present in the scope for additional SELECT, INSERT, UPDATE, or DELETE queries. CTEs can be thought of as alternatives to temporary tables, sub-queries, views, or other in-line user-defined functions.

Ask: Find all posts written within the last month, then return the email of each user who wrote the post.

Since no Post data is required, this ask is a great candidate for a CTE. A new Arel::Table can be created to house the CTE and then the join method on the User.arel_table makes it accessible:

post_arel = Post.arel_table
user_arel = User.arel_table

common_table_results = begin
  post_arel
    .project(post_arel[:user_id])
    .where(post_arel[:created_at]
            .gteq(beginning_of_month))
end

cte_table = Arel::Table.new(
              :posts_last_month)
composed_cte = Arel::Nodes::As.new(
                cte_table,
                common_table_results)

user_arel
  .project(users[:email])
  .join(cte_table)
    .on(users[:id]
         .eq(cte_table[:user_id]))
  .with(composed_cte)
  .to_sql

# => WITH "posts_last_month" AS (SELECT "posts"."user_id" FROM "posts" WHERE "posts"."created_at" >= '2016-03-01') SELECT "users"."email" FROM "users" INNER JOIN "posts_last_month" ON "users"."id" = "posts_last_month"."user_id"

At first glance, this code can be a bit intimidating but it is actually quite simple. Both the Post model and User model have had their underlying Arel representation referenced. Then, the CTE result is computed:

common_table_results = begin
  post_arel
    .project(post_arel[:user_id])
    .where(post_arel[:created_at]
            .gteq(beginning_of_month))
end

The most noticeable attribute here is the project method. This method tells the query what to SELECT. Since only user_ids are relevant in this CTE, that is all that is requested.

Next, the actual CTE is created (in Arel representation only, not in the database):

cte_table = Arel::Table.new(
              :posts_last_month)
composed_cte = Arel::Nodes::As.new(
                cte_table,
                common_table_results)

The As class under the Arel::Nodes namespace is created and then fed into the end of the main query via the with method:

user_arel
  .project(users[:email])
  .join(cte_table)
    .on(users[:id]
         .eq(cte_table[:user_id]))
  .with(composed_cte)
  .to_sql

Again, since only the email of the User is desired, the project method is used. The join method is similar to the ActiveRecord method joins but requires the on clause to tell the query how to join the tables. Finally, the with method receives the computed CTE representation and the query is successfully generated.

To actually use this query, another helpful method on ActiveRecord, find_by_sql can accept the full Arel representation, returning the desired User models with only the email column retrieved:

User.find_by_sql(
  user_arel
  .project(users[:email])
  .join(cte_table)
    .on(users[:id]
         .eq(cte_table[:user_id]))
  .with(composed_cte)
)

# => [#<User id: nil, email: "foo@bar.com">]

Why?

Like most tools, Arel may not be applicable in every situation. However, when used appropriately, it can make for some very well maintainable code that is resistant to unexpected failures. When dealing with multiple levels of an application stack, abstracting away complexities around their communication can help vastly simplify a codebase. ActiveRecord and Arel do a great job of insulating application developers from the nitty-gritty of database communication. Where ActiveRecord falls short, Arel is there to pick up the slack.

For extremely specific or complex database interactions, using Arel may not be the correct choice, but should at least be evaluated. Only interacting with the lowest level necessary for a resource has many benefits. An Arel solution to a problem has the potential to be more resilient to future changes.