ruby
Good Database Migration Practices for Your Ruby on Rails App using Strong Migrations
One great feature that comes with modern web frameworks is the ability to manage database schema migrations. However, schema migrations are not 100% safe and remain a recurring cause of issues within projects I have encountered over the last 15 years.
This article will review the issues surrounding poorly managed schema migrations and then look into Strong Migrations, a gem that can help you avoid most problems. Finally, we will discuss a few good practices around database management.
Let's get started!
Issues with Schema Migrations in Ruby on Rails
Schema migrations are changes to a table or database schema within an RDBMS: adding, renaming, removing, and updating a table (or a column within another table), index, or view. Some schema migrations are inherently risky (such as removing a column or table). Generally, all carry a risk, especially alongside any related code changes.
Schema migrations are also made worse with larger tables. Some changes, like updating columns and altering indexes, take longer to apply and might imply database locks, causing performance issues.
We also have to talk about how migrations are brought to production via related code changes. Migrations that add, update, or remove a schema element (table or column) and are used in the code are the most problematic. This is because many developers don't build good practices into those cases early enough. The typical scenario is as follows:
- A developer adds a Model and related table.
- They merge the code change.
- The code change gets deployed, but the migration hasn't run yet.
- The application starts and crashes.
The reason for the crash is simple: as an application starts, Ruby on Rails will load the schema but won't find the new table in the database.
How Can We Fix Problems with Schema Migrations?
Looking at the above example, the solution is to run a database migration before an application starts. The deployment tooling should take care of that. However, in the case of a column or table removal, a migration must run after an application starts. So the easiest thing is to separate the change into two deployments in the correct order and make sure the tooling or team runs the migration when appropriate. Sometimes, we may have to split the change into more deployments.
Let's take the cases we listed before: adding, removing, or updating a column.
- In the first case of adding a column, we need to deploy and run a migration before the code change that will start to use it.
- In the second case (removing a column), it's the opposite: we need to remove code that's using the column before removing the column itself.
- Updating a column is trickier. Such a change will enforce a lock. And it's even worse if you change the column type. This is the suitable strategy:
- Create a new column.
- Write to both old and new columns.
- Backfill existing data from the old column into the new one.
- Move the reads to the new column.
- Stop writing to the old column.
- Drop the old column.
These three strategies should be used for any project, especially once a database fills up and users rely on the product.
Yet, just like linters remove some cognitive load by taking care of this kind of toil, it's best to rely on a tool to automate this.
Here's where the Strong Migrations library comes into play. It adds a layer of protection around migrations to ensure you can avoid issues upon deployment in any environment.
Getting Help from Strong Migrations
The folks at Instacart have open-sourced many libraries. Among them sits Strong Migrations, aimed at "catching unsafe migrations in development". Once installed, it will:
- Detect potentially dangerous operations.
- Prevent them from running by default, and tell you why.
- Provide instructions on safer ways to do things.
A migration will be deemed dangerous if it's likely to cause the issues we discussed earlier: locks and potential errors due to timing. Check out the Strong Migrations README on GitHub for more information.
We will now cover a few cases here.
Installation
This is straightforward: use bundle add strong_migrations
, followed by bundle install
, and rails generate strong_migrations:install
. The last command will create an initializer to configure a few things when the application starts.
Adding a Column or a Table
There is no help from Strong Migrations on this one, but remember it's good practice to separate a table addition from the code using it. Please ensure a schema change happens before any code using it runs and expects to find it.
Renaming a Column
Here's where we start to see what Strong Migrations can do for us. Taking the example of a migration that renames a column from "address" to "location", we get the following message when we try to run it locally in our development environment:
So, yes, the Strong Migrations gem does not replace you. Instead, it reminds you that what you are doing is dangerous and that you should not do it. It also tells you how you should do things instead.
Removing a Column
Now, let's say that we have gone through the first five steps of the list that Strong Migrations generated for us, and it's time to remove the old column.
Is it as simple as creating a migration and using the remove_column
method? No, it's not.
Again, Strong Migrations will print the following text:
So we know what to add to the model and how to change our schema migration. It's tailored to your use case, including the table's name and column. Once that is done, the migration will go through without an issue.
Backfilling Data
The remaining piece on the database strategy side is backfilling data in new columns from old ones. While relying on the Data Migrate gem might be tempting, we can also backfill data in a regular migration. A safe backfilling migration will consider a table's size, the task's complexity, and avoid transactions.
By default, Ruby on Rails uses a transaction around each migration. If we include backfilling in the migration doing the schema change, this might take some time and potentially cancel our whole schema migration.
So, instead, let's use a separate migration, avoid using a transaction, batch the process, and throttle the work.
Notes:
unscoped
ensures that we don't rely on the "default scope" defined in the model to work with all rows.in_batches
forces the block to run in batches of 1000 entries.sleep
gives some reprieve to the database in between batches.
An alternative method is to use a script or a rake task to handle this. Whatever solution you pick, remember that backfilling a column or table with data might be pretty intensive for the database. Estimate how much data will be handled and consider the techniques used in the above example, if needed.
Adding Strong Migrations to an Existing Ruby on Rails Project
Adding Strong Migrations to an existing project is a good idea. However, you might run into issues when setting up a new project instance on a workstation or within a new environment. A failsafe will kick in and prevent any prior migrations from running.
So, what shall we do? You don't have to go through all the migrations and alter them accordingly. Instead, you should follow the Ruby on Rails guide and clean up old migrations.
I've not seen many projects follow that advice, but it's the "good thing" to do: db/schema.rb
or db/structure.sql
holds a snapshot of the current database schema and can be used to load it. Tasks such as db:setup
and db:prepare
(and ones based on either of them) use that snapshot to load schema in a database. Only pending migrations are run; old ones are barely a line in one table, a mere memory.
You can remove old migration files if you need them: git (or Mercurial, or Svn) will keep a trace of them. Once a migration has been applied to all existing environments, it can be deleted as the schema.rb
(or structure.sql
) file will serve for new environments.
Here are the steps you should follow:
- Before introducing Strong Migrations, ensure all environments are up to date in terms of migrations.
- Remove old migration files.
- Add and install the Strong Migrations gem.
- Start to create new migrations.
Doing so will prevent you from going through years of migrations to set them in line with Strong Migrations. It will also lighten the burden of setting up a new environment.
You can then devise a periodic removal of migration files, keeping the last 5, 10, or any from the previous month, for example.
Some Additional Advice on Migrations
Let's finish up by sharing some general tips on migrations when it comes to:
- Production environments and databases
- Avoiding downtime
- Backups
Production Environments and Databases
Considering your production environment and its database is key when creating and running any migration. A developer environment is rarely bursting at the seams when it comes to data. Consequently, most migrations will run smoothly with barely any waiting time. Production environments have a lot more data that's more diverse. This will tend to complicate things when it comes to writing and applying a migration.
You should follow strategies like backfilling. Batching and throttling are also good practices when handling large volumes of data. It's never too early to start using those to develop good habits. Once a migration is out there and running, we should monitor performance metrics: response time and slow queries. Compare these metrics with previous trends, and look into unexpected changes.
Avoiding Downtime
One reason that developers tend to be scared about making deployments around the end of the day or the week is the fear of the unknown.
By following the good practices pointed out by Strong Migrations, we can reduce the amount of risks and unknowns, thus giving us more confidence in our ability to deploy at any time.
Backups
If things go wrong, don't panic; you should have a backup ready. Most managed solutions available for PostgreSQL, MySQL, and similar RDBMS offer backups at regular intervals. Some also provide point-in-time recovery, allowing us to rewind to better times.
Wrapping Up
In this post, we first explored some potential issues related to schema migrations in Rails, before seeing how we can fix these issues ourselves. We then introduced Strong Migrations. As we have seen, Strong Migrations ensures that risky changes such as removals and changes to a schema element are done correctly to avoid risk.
Finally, we touched on a few additional tips regarding migrations.
Happy coding!
P.S. If you'd like to read Ruby Magic posts as soon as they get off the press, subscribe to our Ruby Magic newsletter and never miss a single post!