How to Remove a Column with Zero Downtime in Ruby on Rails
07 Feb 2016For a production Ruby on Rails application, uptime is paramount. Altering the structure of an application’s persistence layer is an operation that competes directly with uptime. Specifically, removing a table column within a relational database causes issues with the ActiveRecord
ORM (the default relational mapping within a Ruby on Rails application).
However, this particular pain point has been removed as of Ruby on Rails 4.0.0, saving developers a lot of headache and greatly reducing the need for structural change coordination.
Old and Busted
To demonstrate the problem, a simple Ruby on Rails 3.2 application is created with a User
model:
class User < ActiveRecord::Base
end
Supporting the User
model is a PostgreSQL database table created with a migration:
class CreateUsers < ActiveRecord::Migration
def change
create_table :users do |t|
t.string :first_name
t.string :last_name
t.string :email
t.timestamps null: false
end
end
end
To ensure this application has as close to 100% uptime as possible, it can be assumed that it is behind some kind of pre-loader. As in, when code is deployed, existing requests are given time to complete before new requests are shepherded over to the new version of the code.
In a Ruby on Rails 3.2 application, a problem will arise when a column is removed from the database and the ORM does not have time to restart. In this case, even the pre-loader will not save an application from throwing an error about the missing column.
To emulate this problem, a rails console is run in the production
environment and a User
is created:
RACK_ENV=production rails console
> User.create(first_name: 'test', last_name: 'test')
# => #<User id: 1, first_name: "test", last_name: "test", email nil, created_at: "2016-02-07 21:03:26", updated_at: "2016-02-07 21:03:26">
In parallel, the psql
command line client is used to connect to the same production
database used by this server, all still running locally.
Within this psql
prompt, the email
column from the users
table will be removed while the production console is still running:
psql app_production
psql=# ALTER TABLE users DROP COLUMN email;
ALTER TABLE
Jumping back to the RACK_ENV=production rails console
, another User
creation attempt results in an error:
> User.create(first_name: 'test2', last_name: 'test2')
# => PG::UndefinedColumn: ERROR: column "email" of relation "users" does not exist
Kaboom! Rails attempts to save a new record to the users
table and populate the email
column even though it was never specified to do so. Seeing as this is not new behaviour, surely someone has come up with a solution to this problem that does not require downtime.
Yesterday’s Jam
As it turns out, a few different blog posts have been written about the solution to this very problem.
The TL;DR of these posts is that a multiple phase solution is required under the given circumstances:
1. Write code that `rejects` the column to remove
2. Deploy
3. Write migration to remove the column
4. Deploy + Migrate
5. Write code to remove the column `rejection`
6. Deploy
Phew! Three deploys and presumably three code reviews later, the column is finally removed. Assuming a system is built with continuous deployment guarded by a continuous integration suite, this whole process could take the better part of a work day in order to accomplish such a simple task.
Even worse, if an application has scheduled weekly deploys where only one deployment can happen per week, then this column will linger for three whole weeks. That is just absurd, what kind of barbarians can live this way?
New Hotness
Luckily, the incredible people who maintain and create features for the Ruby on Rails framework saw this behaviour and decided to correct it. As of a very influential commit during the beta release of Ruby on Rails 4.0.0, the framework does not aggressively attempt to populate attributes not specified during creation or update.
To prove this, a new Ruby on Rails 4.0.0 application is made with the exact same structure as above.
Then, the same production console is created along with a User
:
RACK_ENV=production rails console
> User.create(first_name: 'test', last_name: 'test')
# => #<User id: 1, first_name: "test", last_name: "test", email nil, created_at: "2016-02-07 22:09:49", updated_at: "2016-02-07 22:09:49">
Again, in parallel, a psql
client is initialized for the production
environment database.
Within this prompt, the email
column on the users
table is removed:
psql app_production
psql=# ALTER TABLE users DROP COLUMN email;
ALTER TABLE
And finally, a new user is created in the still running RACK_ENV=production rails console
:
> User.create(first_name: 'test2', last_name: 'test2')
# => #<User id: 2, first_name: "test2", last_name: "test2", email nil, created_at: "2016-02-07 22:13:18", updated_at: "2016-02-07 22:13:18">
Unlike before, the second insertion into the users
table worked wonderfully, despite the email
column’s removal.
As shown, the in memory representation of the User
class still has the email
attribute. The User
object’s representation has not been reloaded, it simply just does not care about the email
column when writing to the database.
The fact that the ORM does not specifically reload when writing to a table points out an interesting detail. It means that for this solution to fully work, the email
column must not be referenced at all.
With this new found behaviour, the previous six step incantation comprised of three separate deployments can be reduced to three steps in total and just one deployment:
1. Remove references to column and write migration
2. Deploy
3. Once requests completely use new code, migrate
Wunderbar! The reject
and deploy pattern is no longer necessary. Further more, development teams suffering within weekly scheduled deploys can achieve their desired results within a single iteration.
What Changed?
Unfortunately, the amazing change responsible for this vast improvement regarding Ruby on Rails column removal did not receive the attention is deserves. Many blog posts and guides were written about how to circumvent the older 3.2 behaviour, but when looking for information about this modern implementation, resources were scarce.
After reading through a very large amount of change logs and coming up empty, a bit of previous wisdom was utilized in order to track down the commit responsible.
The first step is to identify the offending code in 3.2. This way, the 4.0.0 version of the code can be analyzed, the discrepancy identified and the git revision responsible brought to light.
After some binding.pry
insertions and utilization of the caller
array, the offending code was traced to lib/active_record/attribute_methods.rb:244 (formatted for readability):
def arel_attributes_values(include_primary_key = true,
include_readonly_attributes = true,
attribute_names = @attributes.keys)
attrs = {}
klass = self.class
arel_table = klass.arel_table
attribute_names.each do |name|
if (column = column_for_attribute(name)) &&
(include_primary_key || !column.primary)
if include_readonly_attributes ||
!self.class.readonly_attributes.include?(name)
value =
if klass.serialized_attributes.include?(name)
@attributes[name].serialized_value
else
# FIXME: we need @attributes to be used consistently.
# If the values stored in @attributes were already type
# casted, this code could be simplified
read_attribute(name)
end
attrs[arel_table[name]] = value
end
end
end
attrs
end
While a little lengthy, the most obvious thing to note about this method is that its signature is comprised of entirely optional variables. Most notably, the final optional parameter, attribute_names
, defaults to @attributes.keys
which is the list of attributes for the given model.
In this case, the result of @attributes.keys
is the complete list of keys for a User
: ['id', 'first_name', 'last_name', 'email', 'updated_at', 'created_at']
. Most notably, the email
attribute is present, regardless of the fact that it was not given a value to insert. This means that by default, any create
call will write to the email
column without hesitation (just as previously observed).
While default behaviour can emit a code smell, it is only dangerous if it is used unknowingly. Since the parameter to determine attribute_names
is available, chances are the caller of this method does indeed specify them, right?
Unfortunately, no:
def create
attributes_values = arel_attributes_values(!id.nil?)
new_id = self.class.unscoped.insert attributes_values
self.id ||= new_id if self.class.primary_key
IdentityMap.add(self) if IdentityMap.enabled?
@new_record = false
id
end
Here, in lib/active_record/persistence.rb:364 lies the method responsible for calling arel_attributes_values
. Since it does not specify the list of attributes, the default behaviour prevails and all known attributes are returned.
The Big Fix
On a glorious Friday afternoon (probably) Jon Leighton made the commit which changed how people thought about column removal in Ruby on Rails.
The change enabled the concept of partial_writes
. This pattern used the exiting changed
hash present in every ActiveRecord
model. The changed
hash is manipulated as a model has attributes assigned to it.
In lib/active_record/attribute_methods/dirty.rb:77, partial_writes
is used:
def create_record(*)
partial_writes? ? super(keys_for_partial_write) : super
end
# Serialized attributes should always be written in case they've been
# changed in place.
def keys_for_partial_write
changed | (attributes.keys & self.class.serialized_attributes.keys)
end
Before delegating to the same persistence.rb#create
method, only the attributes present in the changed
hash (along with any serialized attributes) will be used in subsequent insert queries.
Included in the February 2013 beta release of Ruby on Rails 4.0.0, this code enables a much more pleasant experience than the older work around. In fact, the author even pointed out this behaviour in his commit:
When inserting new records, only the fields which have been changed from the defaults will actually be included in the INSERT statement. The other fields will be populated by the database.
This is more efficient, and also means that it will be safe to remove database columns without getting subsequent errors in running app processes (so long as the code in those processes doesn’t contain any references to the removed column).
Unshackled
While this patch to the Ruby on Rails framework may not make every situation easier, it certainly reduces both cognitive and implementation overhead for many developers.
For more automated systems utilizing continuous deployment, two deployments may still be necessary since code running on old processes (within some sort of pre loader) will not be able to handle the column removal.
The deployment process for such a system may look more like:
1. Remove references to column
2. Deploy
3. Write migration to drop column
4. Deploy and migrate
Databases that power web applications are powerful and oddly fragile beasts. Manipulating these beasts while maintaining superb end user experiences requires one to constantly be learning. Hopefully with this behaviour, one more thing can be crossed off the check list of “things to remember when manipulating the database” and we can all just get back to work.