How did the client get there?
Before diving into the details, let's try to understand how an app might end up in this state. We start with a simple users
table. After a few weeks, we need to be able to determine the last sign in time so we add users.last_sign_in_at
. Then we need to know the user's name. We add first_name
and last_name
. Twitter handle? Another column. GitHub profile? Phone number? After a few months the table becomes mind-boggling.
What's wrong with this?
A table that large indicates several problems:
User
has multiple unrelated responsibilities. This makes it more difficult to understand, change and test.- Exchanging data between the app and the database requires extra bandwidth.
- The app needs more memory to store the bulky model.
The app fetched User
on every request for authentication and authorisation purposes but usually used only a handful of columns. Fixing the problem would improve both the design and performance.
Extracting a table
We can solve the problem by extracting seldom-used columns to a new table (or tables). For example, we can extract profile information (first_name
, etc.) into profiles
with the following steps:
- Create
profiles
with columns duplicating profile-related columns inusers
. - Add
profile_id
tousers
. Set it toNULL
for now. - For each row in
users
, insert a row toprofiles
that duplicates profile-related columns. - Point
profile_id
of the corresponding row inusers
to the row inserted in 3. - Do not make
users.profile_id
non-NULL
. The app isn't aware of its existence yet so it'd break.
We need to replace references to users.first_name
with profiles.first_name
and so on. If we're extracting just a few columns with a handful of references then I recommend we do this manually. But as soon as we catch ourselves thinking "Oh, no. This is the worst job ever!" we should look for an alternative.
Don't neglect the problem. A part of the code that everyone avoids will deteriorate further and suffer from even greater inattention. The easiest way to break the vicious circle is to start small.
Read on, if you're curios how my client solved the problem.
Fixing the code one line at a time
The most incremental approach is fixing one reference to the old column at a time. Let's focus on moving first_name
from users
to profiles
.
First, create Profile
with:
Then add a reference from users
to profiles
and copy users.first_name
to profiles
:
Because it forces each user to have exactly one profile, a reference from users
to profiles
is preferable to the opposite reference.
With the database structure in place, we can delegate first_name
from User
to Profile
. My client had several requirements:
- Accessors should use the associated
Profile
. They should also log where the deprecated accessor was called from. - Saving
User
should automatically saveProfile
in order to avoid breaking code using the deprecated accessors. User#first_name_changed?
and otherActiveModel::Dirty
methods should still work.
This means User
should look like this:
After these changes, the app works the same but may be a bit slower because of the extra references to Profile
(if performance becomes an issue just use a tool like AppSignal). The code logs all references to the legacy attributes, even ungreppable ones (e.g. user[attr] = ...
or user.send("#{attr}=", ...)
) so we'll be able to locate all of them even when grep
is unhelpful.
With this infrastructure in place, we can commit to fixing one reference to users.first_name
on a regular schedule, e.g. every morning (to start the day with a quick win) or around noon (to work on something easier after a focused morning). This commitment is essential because our goal is to lessen mental barriers to fixing the issue. Leaving the code above in place without taking action will impoverish the app even further.
After removing all deprecated references (and confirming with grep
and logs) we can finally drop users.first_name
:
We should also get rid of the code added to User
as it's no longer necessary.
Limitations
The method might apply to your case but keep in mind some of its limitations:
- It doesn't handle bulk queries like
User.update_all
. - It doesn't handle raw SQL queries.
- It may break monkey-patches (remember that dependencies might introduce them too).
User
andProfile
may be out of sync, ifprofiles.first_name
is updated butusers.first_name
is not.
You might be able to overcome some of them. For example, you might keep the models in sync with a service object or a callback on Profile
. Or if you use PostgreSQL you might consider using a materialized view in the interim.
That's it!
The most important lesson of the article is do not avoid code that smells but tackle it head on instead. If the task is overwhelming then work iteratively on a regular schedule. The article presented a method to consider when extracting a table is difficult. If you can't apply it then look for something else. If you have no idea how then just drop me a line. I'll try to help. Don't let your bits rot.