Things to Consider when Metaprogramming in Ruby

10 Apr 2016

Metaprogramming in Ruby is a polarizing topic. The most common purpose of Ruby metaprogramming is for code to alter itself at runtime. Metaprogramming can be used to achieve terse and more flexible code. However, it is not without its cost. As with most things, nothing of value is free, even metaprogramming.

Undoubtedly, there is a time and place for metaprogramming; but, awareness of concessions that need to be made to support a metaprogrammed solution is important.

Code Discovery and Readability

One problem with metaprogramming solutions are their obstruction of code discovery. When entering a new project or simply trying to re-familiarize oneself with an existing one, tracing code execution in a text editor can be quite difficult if method definitions do not exist.

For example, we can assume that a User class exists with a set of metaprogrammed methods:

class User
  [
    :password,
    :email,
    :first_name,
    :last_name
  ].each do |attribute|
    define_method(:"has_#{attribute}?") do
      self.send(attribute).nil?
    end
  end
end

Although a little contrived, this code is a list of simple convenience methods on a User class. This solution is easily extended to include additional attributes without a full method definition per attribute.

However, these methods can not be found using grep, silver searcher, or other “find all” tools. Since the method has_password? is never explicitly defined in the code, it is not discoverable.

A Work Around:

To combat this issue, some developers choose to write a comment listing the defined method names above the metaprogramming block. This simple solution can greatly help the readability of the code:

class User
  # has_password?, has_email?, has_first_name?,
  #  has_last_name? method definitions
  [
    :password,
    :email,
    :first_name,
    :last_name
  ].each do |attribute|
    define_method(:"has_#{attribute}?") do
      self.send(attribute).nil?
    end
  end
end

Performance

Depending on the amount of times a piece of code is executed, performance considerations can be extremely important. “Hot code” is a term used to describe code that is called frequently during an application’s request cycle. Since not all code is created equally, understanding the performance implications of different metaprogramming approaches is imperative when writing or modifying hot code.

The Setup

An example application needs to handle incoming data at scale. Upon receiving the data, it must make it accessible to the rest of the application. Myriad options exist to solve this problem, but we can assume that only a few are feasible for this Ruby codebase.

The incoming data looks like the following:

{
  "user": {
    "name": "Some User",
    "phones": [
      "818-555-5555",
      "415-555-5555"
    ],
    "email": "email@whatever.com",
    "birthday": "12-12-1900"
  }
}

Note: This data will be referred to in the following examples as incoming_data and we can assume it was been decoded from JSON into a Ruby Hash.

1. All Methods

One way to accept and integrate this incoming data would be create a class which maps all attributes under the 'user' key to methods:

class UserMetaMethods
  def initialize(hash)
    hash.each_pair do |key, value|
      self.class.send(:attr_accessor, key)
      self.send(:"#{key}=", value)
    end
  end
end

user = UserMetaMethods.new(incoming_data['user'])
user.email
# => email@whatever.com

This solution makes accessing the incoming data very consumer friendly. All attributes appear as methods that return positively to respond_to? and have corresponding instance variables per attribute.

2. `method_missing`

A group of metaprogramming solutions would not be complete without one utilizing method_missing. With method_missing, a non-existent method call can be intercepted on an object and evaluated with additional data unbeknownst to the original caller.

class UserMethodMissing
  def initialize(hash)
    @hash = hash
  end

  def method_missing(method_name, *arguments, &block)
    key = method_name.to_s
    if @hash.key?(key)
      @hash[key]
    else
      super
    end
  end

  def respond_to_missing?(method_name, include_private = false)
    @hash.key?(method_name.to_s) || super
  end
end

user = UserMethodMissing.new(incoming_data['user'])
user.email
# => email@whatever.com

The respond_to_missing? method is also defined to enable respond_to? and method calls to execute successfully. Read more information about respond_to_missing? here.

Note: Patterns equivalent to this are used in some popular libraries like OpenStruct and Hashie to achieve similar results.

3. “Regular” Object

As a control, a regular Ruby object can be created with specific attributes defined:

class UserRegular
  attr_reader :name,
              :phones,
              :email,
              :birthday

  def initialize(hash)
    @name = hash['name']
    @phones = hash['phones']
    @email = hash['email']
    @birthday = hash['birthday']
  end
end

user = UserRegular.new(incoming_data['user'])
user.email
# => email@whatever.com

An immediate downside to this approach is: if the contract of the external service changes this object may not be initialized with all pertinent data.

4. A `Hash`

No additional code is required for this approach, a consumer would simply use the resulting Ruby Hash after the received JSON is parsed:

incoming_data['user']['email']
# => email@whatever.com

Not a metaprogramming solution, but still a valid way of handling the passed in data. Using a simple Hash does not grant the flexibility of the other solutions but can be a great base-case for performance testing.

How They Compare

Finally, the exciting part: potentially relevant performance benchmarks.

The library we will use to test how each of these solutions does is benchmark/ips.

This library makes it simple to define different implementation sections and then compare them:

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report('UserMetaMethods') do
    1000.times do
      u = UserMetaMethods.new(incoming_data['user'])
      u.email
    end
  end

  x.report('UserMethodMissing') do
    1000.times do
      u = UserMethodMissing.new(incoming_data['user'])
      u.email
    end
  end

  x.report('UserRegular') do
    1000.times do
      u = UserRegular.new(incoming_data['user'])
      u.email
    end
  end

  x.report('Hash') do
    1000.times do
      u = Hash(incoming_data['user'])
      u['email']
    end
  end

  x.compare!
end

Each report corresponds to a solution described above. The Hash report did not need to initialize a new object, but for consistency’s sake, a new Hash is initialized from everything under the 'user' key of the original Hash.

Running this code results in:

Calculating -------------------------------------
   UserMetaMethods     79.294  (± 2.5%) i/s -    399.000
 UserMethodMissing      1.531k (± 1.2%) i/s -      7.791k
       UserRegular    913.295  (± 1.4%) i/s -      4.628k
              Hash      3.141k (± 1.0%) i/s -     15.860k

Comparison:
              Hash:     3141.2 i/s
 UserMethodMissing:     1530.5 i/s - 2.05x slower
       UserRegular:      913.3 i/s - 3.44x slower
   UserMetaMethods:       79.3 i/s - 39.61x slower

Wow! Aside from simply using a Hash, the method_missing implementation is the fastest. Quick, everyone go change all the code to use method_missing! No. Stop. Do not do that.

While it might be faster than the UserRegular implementation, it is not without its drawbacks. A method_missing solution certainly has value in a variety of situations but a simple benchmark should not persuade anyone to simply switch their code around to gain the “speed up”.

What about existing libraries that have similar behaviour to method_missing (i.e. Hashie and OpenStruct)?

To add them to the existing benchmark, the corresponding classes must be made:

require 'ostruct'

class UserOpenStruct < OpenStruct
end

require 'hashie'

class UserMash < Hashie::Mash
end

Then two new report calls can add them to the existing benchmark:

Benchmark.ips do |x|
  # ...

  x.report('OpenStruct') do
    1000.times do
      u = UserOpenStruct.new(incoming_data['user'])
      u.email
    end
  end

  x.report('UserMash') do
    1000.times do
      u = UserMash.new(incoming_data['user'])
      u.email
    end
  end

  # ...
end

The results of these two additions is a bit of a surprise:

Calculating -------------------------------------
   UserMetaMethods     79.050  (± 2.5%) i/s -    399.000
 UserMethodMissing      1.537k (± 1.3%) i/s -      7.752k
       UserRegular    914.824  (± 1.4%) i/s -      4.576k
        OpenStruct     49.954  (± 6.0%) i/s -    250.000
          UserMash    194.411  (± 1.5%) i/s -    988.000
              Hash      3.140k (± 0.9%) i/s -     15.759k

Comparison:
              Hash:     3140.1 i/s
 UserMethodMissing:     1536.9 i/s - 2.04x slower
       UserRegular:      914.8 i/s - 3.43x slower
          UserMash:      194.4 i/s - 16.15x slower
   UserMetaMethods:       79.0 i/s - 39.72x slower
        OpenStruct:       50.0 i/s - 62.86x slower

Despite OpenStruct and Hashie seeming very similar to our homegrown method_missing solution, both yielded worse results. However, like other metaprogramming solutions, both OpenStruct and Hashie make up for this speed deficiency with flexibility.

If this were a real problem in a production application, OpenStruct and Hashie could certainly both be viable solutions. Unless the code path to utilize these libraries was scorching hot, their performance issues might not be a factor.

Why the Slowdown?

The reason that some metaprogramming solutions are slow has partially to do with the Ruby inline method cache. In Ruby, the inline method cache is responsible for storing methods that it knows about so as to avoid a costly look up operation every time. Metaprogramming interferes with this built in cache by invalidating its cache key.

Every time a class is reopened or a method is defined on a class, pieces of the inline method cache key change, resulting in a cache miss and method lookup. Metaprogrammed code (especially code that executes at every Object.new like UserMetaMethods) does not benefit from the inline method caching in the same ways as “normal” code does. For much more information, a man much smarter than I wrote a great article explaining Ruby inline method caching in detail.

Use the Right Tool for the Job

When iterating through a list of options, no single data point is sufficient enough to rule one option superior to all others. Benchmarks should be treated as a single data point and bring depth to a comparison, not rule it. After all, who cares how slow a piece of code is if it is never run?

Metaprogramming is a very powerful tool in the Ruby language that should be wielded with care. Like anything, using metaprogramming too much can cause unmaintainable code. This sort of code might be great for job security, but could be less performant, unreadable, and unmaintainable by coworkers.

When used correctly and in appropriate circumstances, metaprogramming can be a great asset. The trick is knowing when to use it and when to refrain. Sometimes just using a Hash might be the best solution.

A Year of Commits