How To Parallelize Ruby HTTP Requests
10 Jan 2016It turns out that managing web requests is quite important when doing web development. A web application backed by an external or internal API can issue a lot of requests when rendering a seemingly simple web page. How those requests are made and in what order is very important. With improper parallelization, an end user’s entire experience can go from delightful to horrific in a matter of seconds.
The Build Up
A basic Ruby on Rails application might have the following features:
- A
User
can create a favorite of an item, creating aFavoriteItem
. - A
User
can add an item to their wishlist, creating aWishlistItem
. - A
User
can buy an item, creating aTransactionItem
.
Each model this system uses is backed by an API. User
, TransactionItem
, WishlistItem
, and FavoriteItem
models all require a remote HTTP request for their information.
As a web application experiences growth, this structure is not uncommon. The same API might back a mobile app, website, and any other internal tooling to help this company with its day to day affairs.
The API that this application uses works in a two phase manner:
- A user can be requested by their
id
./users/:id
returns aUser
corresponding to the givenid
.
- Each supporting model is requested with the same
user_id
./users/:id/favorite_items
will return an array ofFavoriteItems
for the specifiedUser
.
The contract of this API is for all intents an purposes, non-negotiable. In-lined data or other request saving patterns are not available, the client must use the API as provided.
Sequential Approach
Within this example application, the most request intensive page is the User's
history page. The history page consists of everything the user has done. Items
a user has added to their favorites resulting in FavoriteItems
, Items
bought by the user resulting in TransactionItems
, and Items
added to a user’s wish list resulting in WishListItems
.
To complement the (ex/in)ternal API, two helper methods exist on each model: remote_find
which accepts an id
or array of ids
, returning the matching models, and remote_find_by_user_id
which accepts a user_id
and returns an array of ids
for the corresponding model.
Leveraging the API and these two helper methods, a serial version of a User's
pertinent data could look something like:
class UserHistoryController < ApplicationController
def show
@user = User.remote_find(params.require(:user_id]))
@favorite_items =
FavoriteItem.remote_find_by_user_id(@user.id)
@wish_list_items =
WishlistItem.remote_find_by_user_id(@user.id)
@transaction_items =
TransactionItem.remote_find_by_user_id(@user.id)
end
end
However slow it may be, this code will indeed fulfill its duty. First the User
is found, then all the supporting information about a User's
history is retrieved one by one from the API.
In a worst case scenario, assume the API returns an individual request in about 150ms
. With 4
requests, this means that only required page elements will take 600ms
.
Tacking on view rendering, whatever data processing or formatting that needs to be done at the presentation layer, and finally some amount of Javascript, this page becomes objectively slow.
Enter EM-Synchrony
As with most problems, someone smart has had it before and probably done something about it. Luckily, this particular problem has been addressed by Ilya Grigorik’s EM-Synchrony library.
Grigorik’s library leverages the use of Ruby Fibers
to parallelize* code execution.
*Note: Because of the Global Interpreter Lock (GIL) present in MRI, true parallelization is not possible.
The real benefit of using EM-Synchrony
is the way it handles scheduling the underlying Fibers
. A Ruby Fiber
is basically a Thread
that is not automatically scheduled by the Ruby VM. This means that it is up to the programmer to let the Fiber
know when it should relinquish control to another Fiber
.
Since this example is almost completely bound by Input/Output
(I/O
) operations, it is a perfect candidate for EM-Synchrony
.
To use EM-Synchrony
effectively, existing requests must be broken up into pieces that can accessed independently. A requests
method can be created to extract details surrounding each request. The result of the requests
method must be an enumerable.
Note: As explained below, EM-Synchrony
can only be used with supported HTTP clients.
class UserHistoryController < ApplicationController
# ...
# ...
private
def requests
{
User => {
method: :remote_find,
arg: @user_id,
instance_var: :@user
},
FavoriteItem => {
method: :remote_find_by_user_id,
arg: @user_id,
instance_var: :@favorite_items
},
WishListItem => {
method: :remote_find_by_user_id,
arg: @user_id,
instance_var: :@wish_list_items
},
TransactionItem => {
method: :remote_find_by_user_id,
arg: @user_id,
instance_var: :@transaction_items
}
}
end
end
For the sake of being explicit, this very redundant helper method will be used to iterate over requests and allow EM-Synchrony
to process them in parallel:
class UserHistoryController < ApplicationController
CONCURRENCY = 4
def show
@user_id = params.require(:user_id)
EM.synchrony do
EM::Synchrony::FiberIterator
.new(requests, CONCURRENCY)
.each do |key, request_hash|
# Example of below with real values:
#
# key = Transaction
# Transation.remote_find_by_user_id(@user.id)
result = key.send(request_hash[:method], request_hash[:arg])
instance_variable_set(args[:instance_var], result)
end
EM.stop
end
end
end
A few important lines of this solution are worth a closer look:
CONCURRENCY = 4
This line tells EM-Synchrony
how many Fibers
it is allowed to run at once. Since the example needed to request four remote resources, the concurrency amount is set to four.
The next line of interest is:
EM::Synchrony::FiberIterator
.new(requests, CONCURRENCY)
.each do |key, request_hash|
Here is where the actual work is done. After meticulously crafting the structure of the requests
hash, the FiberIterator
will pick the next element from the list and give it to a Fiber
.
Finally the line stopping of Event Machine can not go unnoticed.
EM.stop
This method has literally no documentation but is in every example of using Event Machine, so it is probably very crucial.
And that is all there is to it! All four HTTP requests can now be run in parallel, resulting in a major speed up and a much happier user experience.
Caveats
A few considerations must be made in order to use EM-Synchrony
effectively.
- A compliant HTTP library must be used.
- As em-http-client’s the README a compliant HTTP library must be used when making the requests. Faraday is my favorite compliant library and it has worked great for me so far. Unfortunately, the very popular
HTTParty
library is not compliant and therefore no one may “Party Hard” when usingEM-Synchrony
andHTTParty
.
- As em-http-client’s the README a compliant HTTP library must be used when making the requests. Faraday is my favorite compliant library and it has worked great for me so far. Unfortunately, the very popular
- One size does not fit all.
- Unlike the described example, not all code can be saved by
EM-Synchrony
. If a request depends on the result of another request, sharing those results betweenFibers
is neither simple, nor a good idea.
- Unlike the described example, not all code can be saved by