Most migrations do very little and run very quickly. Add a column or two; run in a second or two. Other migrations need to do much more, but we still want them to run just as quickly. And, if they’re going to take a little longer, we want to know why.
In Ace Ventura, as Ace is about to search a drained dolphin tank for clues of the crime, he says to those nearby, “If I’m not back in five minutes, just wait longer.” While good for a laugh, that’s not good “advice” from a migration. Here’s a migration that reminds me of Ace:
A fairly typical migration, right? Add a column and give it a value for all existing records. But, look at the output from running it:
12 minutes?! That makes for a long feedback cycle, not to mention a long deploy. What’s more, that 12 minutes silently passes between the
add_column call and the end of the migration. Another developer who pulls down this migration and runs it is going to wonder what’s taking so long.
say to display a message about what’s being done in the block that you pass to it and Use
say_with_time to also display the elapsed time after the block executes.
Here’s the output:
That’s much more informative, but it isn’t any faster.
I Feel the Need…the Need for Speed
Like Maverick and Goose from Top Gun, we feel the need for speed. In Rails 2.3.x
find_each finds records in batches of one thousand and loads them into memory. Though there are no guarantees, hopefully garbage collection will run and free that memory.
Since we’re working with several hundred thousand records in this migration, using
find_each is a great way to avoid exhausting memory. We’re not just finding records, though. We’re also updating them—one at a time—with the same value. Oh.
If you have any experience with SQL, you know that it operates on sets of records.
ActiveRecord enables us to work that way when we need to with methods like
update_all, which constructs one
SQL UPDATE statement and sends it to the database without loading the affected records.
Update all records without having to find and load any of them first? Sounds perfect. Let’s try it:
And the result?
x faster! I can wait
6s for feedback or as part of a deploy. We’re not quite done, though.
Dotting the i’s and Crossing the t’s
In our migration, we add a column and then use its ActiveRecord model to set its value for all existing records. To be sure we have immediate access to this new column via the model, we can call
reset_column_information. This method tells
ActiveRecord to re-read the model’s column information from its backing database table.
I don’t think I’ve ever seen a migration fail due to stale column information, but I wouldn’t chance it, since Murphy’s Law will probably only be proven true during a production deploy.
say_with_timeto be talkative about what migrations are doing when they do more than usual or take longer than a few seconds.
- Ensure migrations do their work quickly for rapid feedback in development and when deploying.
update_alland related methods let you work closer to SQL metal when you need their speed and efficiency of working with large sets of data.