Most migrations do very little and run very quickly. Add a column or two; run in a second or two. Other migrations need to do much more, but we still want them to run just as quickly. And, if they’re going to take a little longer, we want to know why.
In Ace Ventura, as Ace is about to search a drained dolphin tank for clues of the crime, he says to those nearby, “If I’m not back in five minutes, just wait longer.” While good for a laugh, that’s not good “advice” from a migration. Here’s a migration that reminds me of Ace:
A fairly typical migration, right? Add a column and give it a value for all existing records. But, look at the output from running it:
12 minutes?! That makes for a long feedback cycle, not to mention a long deploy. What’s more, that 12 minutes silently passes between the add_column
call and the end of the migration. Another developer who pulls down this migration and runs it is going to wonder what’s taking so long.
Be Talkative
Whether this migration runs for 12 minutes or 12 seconds, we can be open about what it’s doing by using say
or say_with_time
.
Use say
to display a message about what’s being done in the block that you pass to it and Use say_with_time
to also display the elapsed time after the block executes.
We’ll use say_with_time
:
Here’s the output:
That’s much more informative, but it isn’t any faster.
I Feel the Need…the Need for Speed
Like Maverick and Goose from Top Gun, we feel the need for speed. In Rails 2.3.x find_each
finds records in batches of one thousand and loads them into memory. Though there are no guarantees, hopefully garbage collection will run and free that memory.
Since we’re working with several hundred thousand records in this migration, using find_each
is a great way to avoid exhausting memory. We’re not just finding records, though. We’re also updating them—one at a time—with the same value. Oh.
If you have any experience with SQL, you know that it operates on sets of records. ActiveRecord
enables us to work that way when we need to with methods like update_all
, which constructs one SQL UPDATE
statement and sends it to the database without loading the affected records.
Update all records without having to find and load any of them first? Sounds perfect. Let’s try it:
And the result?
128x
faster! I can wait 6s
for feedback or as part of a deploy. We’re not quite done, though.
Dotting the i’s and Crossing the t’s
In our migration, we add a column and then use its ActiveRecord model to set its value for all existing records. To be sure we have immediate access to this new column via the model, we can call reset_column_information
. This method tells ActiveRecord
to re-read the model’s column information from its backing database table.
I don’t think I’ve ever seen a migration fail due to stale column information, but I wouldn’t chance it, since Murphy’s Law will probably only be proven true during a production deploy.
Review
-
Use
say
orsay_with_time
to be talkative about what migrations are doing when they do more than usual or take longer than a few seconds. - Ensure migrations do their work quickly for rapid feedback in development and when deploying.
-
Use
update_all
and related methods let you work closer to SQL metal when you need their speed and efficiency of working with large sets of data.