How To Parallelize Ruby HTTP Requests

10 Jan 2016

It turns out that managing web requests is quite important when doing web development. A web application backed by an external or internal API can issue a lot of requests when rendering a seemingly simple web page. How those requests are made and in what order is very important. With improper parallelization, an end user’s entire experience can go from delightful to horrific in a matter of seconds.

The Build Up

A basic Ruby on Rails application might have the following features:

A User can create a favorite of an item, creating a FavoriteItem.
A User can add an item to their wishlist, creating a WishlistItem.
A User can buy an item, creating a TransactionItem.

Each model this system uses is backed by an API. User, TransactionItem, WishlistItem, and FavoriteItem models all require a remote HTTP request for their information.

As a web application experiences growth, this structure is not uncommon. The same API might back a mobile app, website, and any other internal tooling to help this company with its day to day affairs.

The API that this application uses works in a two phase manner:

A user can be requested by their id.
- /users/:id returns a User corresponding to the given id.
Each supporting model is requested with the same user_id.
- /users/:id/favorite_items will return an array of FavoriteItems for the specified User.

The contract of this API is for all intents an purposes, non-negotiable. In-lined data or other request saving patterns are not available, the client must use the API as provided.

Sequential Approach

Within this example application, the most request intensive page is the User's history page. The history page consists of everything the user has done. Items a user has added to their favorites resulting in FavoriteItems, Items bought by the user resulting in TransactionItems, and Items added to a user’s wish list resulting in WishListItems.

To complement the (ex/in)ternal API, two helper methods exist on each model: remote_find which accepts an id or array of ids, returning the matching models, and remote_find_by_user_id which accepts a user_id and returns an array of ids for the corresponding model.

Leveraging the API and these two helper methods, a serial version of a User's pertinent data could look something like:

class UserHistoryController < ApplicationController
  def show
    @user = User.remote_find(params.require(:user_id]))

    @favorite_items =
      FavoriteItem.remote_find_by_user_id(@user.id)

    @wish_list_items =
      WishlistItem.remote_find_by_user_id(@user.id)

    @transaction_items =
      TransactionItem.remote_find_by_user_id(@user.id)
  end
end

However slow it may be, this code will indeed fulfill its duty. First the User is found, then all the supporting information about a User's history is retrieved one by one from the API.

In a worst case scenario, assume the API returns an individual request in about 150ms. With 4 requests, this means that only required page elements will take 600ms.

Tacking on view rendering, whatever data processing or formatting that needs to be done at the presentation layer, and finally some amount of Javascript, this page becomes objectively slow.

Enter `EM-Synchrony`

As with most problems, someone smart has had it before and probably done something about it. Luckily, this particular problem has been addressed by Ilya Grigorik’s EM-Synchrony library.

Grigorik’s library leverages the use of Ruby Fibers to parallelize* code execution.

*Note: Because of the Global Interpreter Lock (GIL) present in MRI, true parallelization is not possible.

The real benefit of using EM-Synchrony is the way it handles scheduling the underlying Fibers. A Ruby Fiber is basically a Thread that is not automatically scheduled by the Ruby VM. This means that it is up to the programmer to let the Fiber know when it should relinquish control to another Fiber.

Since this example is almost completely bound by Input/Output (I/O) operations, it is a perfect candidate for EM-Synchrony.

To use EM-Synchrony effectively, existing requests must be broken up into pieces that can accessed independently. A requests method can be created to extract details surrounding each request. The result of the requests method must be an enumerable.

Note: As explained below, EM-Synchrony can only be used with supported HTTP clients.

class UserHistoryController < ApplicationController
  # ...
  # ...

  private

  def requests
    {
      User => {
        method: :remote_find,
        arg: @user_id,
        instance_var: :@user
      },
      FavoriteItem => {
        method: :remote_find_by_user_id,
        arg: @user_id,
        instance_var: :@favorite_items
      },
      WishListItem => {
        method: :remote_find_by_user_id,
        arg: @user_id,
        instance_var: :@wish_list_items
      },
      TransactionItem => {
        method: :remote_find_by_user_id,
        arg: @user_id,
        instance_var: :@transaction_items
      }
    }
  end
end

For the sake of being explicit, this very redundant helper method will be used to iterate over requests and allow EM-Synchrony to process them in parallel:

class UserHistoryController < ApplicationController
  CONCURRENCY = 4

  def show
    @user_id = params.require(:user_id)

    EM.synchrony do
      EM::Synchrony::FiberIterator
        .new(requests, CONCURRENCY)
        .each do |key, request_hash|

        # Example of below with real values:
        #
        # key = Transaction
        # Transation.remote_find_by_user_id(@user.id)

        result = key.send(request_hash[:method], request_hash[:arg])
        instance_variable_set(args[:instance_var], result)
      end

      EM.stop
    end
  end
end

A few important lines of this solution are worth a closer look:

CONCURRENCY = 4

This line tells EM-Synchrony how many Fibers it is allowed to run at once. Since the example needed to request four remote resources, the concurrency amount is set to four.

The next line of interest is:

EM::Synchrony::FiberIterator
        .new(requests, CONCURRENCY)
        .each do |key, request_hash|

Here is where the actual work is done. After meticulously crafting the structure of the requests hash, the FiberIterator will pick the next element from the list and give it to a Fiber.

Finally the line stopping of Event Machine can not go unnoticed.

EM.stop

This method has literally no documentation but is in every example of using Event Machine, so it is probably very crucial.

And that is all there is to it! All four HTTP requests can now be run in parallel, resulting in a major speed up and a much happier user experience.

Caveats

A few considerations must be made in order to use EM-Synchrony effectively.

A compliant HTTP library must be used.
- As em-http-client’s the README a compliant HTTP library must be used when making the requests. Faraday is my favorite compliant library and it has worked great for me so far. Unfortunately, the very popular HTTParty library is not compliant and therefore no one may “Party Hard” when using EM-Synchrony and HTTParty.
One size does not fit all.
- Unlike the described example, not all code can be saved by EM-Synchrony. If a request depends on the result of another request, sharing those results between Fibers is neither simple, nor a good idea.

A Year of Commits

How To Parallelize Ruby HTTP Requests

The Build Up

Sequential Approach

Enter EM-Synchrony

Caveats

Enter `EM-Synchrony`