Logo of AppSignal

Menu

How to tame your growing users table

Greg Navis on

This is a guest article by Greg Navis. In his consulting work, Greg is exposed to different design decisions, trade-offs, problems, and bugs. One of his clients had an interesting problem with his users table. It had over 80 columns which is a sign of bad design and worsens performance. Extracting a table (or multiple tables) would be an obvious solution but because the code was large it would be tedious and error-prone. In this article, he’ll show you how he helped the client to find a manageable fix.

How did the client get there?

Before diving into the details, let’s try to understand how an app might end up in this state. We start with a simple users table. After a few weeks, we need to be able to determine the last sign in time so we add users.last_sign_in_at. Then we need to know the user’s name. We add first_name and last_name. Twitter handle? Another column. GitHub profile? Phone number? After a few months the table becomes mind-boggling.

What’s wrong with this?

A table that large indicates several problems:

  1. User has multiple unrelated responsibilities. This makes it more difficult to understand, change and test.
  2. Exchanging data between the app and the database requires extra bandwidth.
  3. The app needs more memory to store the bulky model.

The app fetched User on every request for authentication and authorisation purposes but usually used only a handful of columns. Fixing the problem would improve both the design and performance.

Extracting a table

We can solve the problem by extracting seldom-used columns to a new table (or tables). For example, we can extract profile information (first_name, etc.) into profiles with the following steps:

  1. Create profiles with columns duplicating profile-related columns in users.
  2. Add profile_id to users. Set it to NULL for now.
  3. For each row in users, insert a row to profiles that duplicates profile-related columns.
  4. Point profile_id of the corresponding row in users to the row inserted in 3.
  5. Do not make users.profile_id non-NULL. The app isn’t aware of its existence yet so it’d break.

We need to replace references to users.first_name with profiles.first_name and so on. If we’re extracting just a few columns with a handful of references then I recommend we do this manually. But as soon as we catch ourselves thinking “Oh, no. This is the worst job ever!” we should look for an alternative.

Don’t neglect the problem. A part of the code that everyone avoids will deteriorate further and suffer from even greater inattention. The easiest way to break the vicious circle is to start small.

Read on, if you’re curios how my client solved the problem.

Fixing the code one line at a time

The most incremental approach is fixing one reference to the old column at a time. Let’s focus on moving first_name from users to profiles.

First, create Profile with:

1
rails generate model Profile first_name:string

Then add a reference from users to profiles and copy users.first_name to profiles:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class ExtractUsersFirstNameToProfiles < ActiveRecord::Migration
  # Redefine the models to break dependency on production code. We need
  # vanilla models without callbacks, etc. Also, removing a model in the future
  # might break the migration.
  class User < ActiveRecord::Base; end
  class Profile < ActiveRecord::Base; end

  def up
    add_reference :users, :profile, index: true, unique: true, foreign_key: true

    User.find_each do |user|
      profile = Profile.create!(first_name: user.first_name)
      user.update!(profile_id: profile.id)
    end

    change_column_null :users, :profile_id, false
  end

  def down
    remove_reference :users, :profile
  end
end

Because it forces each user to have exactly one profile, a reference from users to profiles is preferable to the opposite reference.

With the database structure in place, we can delegate first_name from User to Profile. My client had several requirements:

  1. Accessors should use the associated Profile. They should also log where the deprecated accessor was called from.
  2. Saving User should automatically save Profile in order to avoid breaking code using the deprecated accessors.
  3. User#first_name_changed? and other ActiveModel::Dirty methods should still work.

This means User should look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class User < ActiveRecord::Base
  # We need autosave as the client code might be unaware of
  # Profile#first_name and still reference User#first_name.
  belongs_to :profile, autosave: true

  def first_name
    log_backtrace(:first_name)
    profile.first_name
  end

  def first_name=(new_first_name)
    log_backtrace(:first_name)

    # Call super so that User#first_name_changed? and similar still work as
    # expected.
    super

    profile.first_name = new_first_name
  end

  private

  def log_backtrace(name)
    filtered_backtrace = caller.select do |item|
      item.start_with?(Rails.root.to_s)
    end
    Rails.logger.warn(<<-END)
A reference to an obsolete attribute #{name} at:
#{filtered_backtrace.join("\n")}
END
  end
end

After these changes, the app works the same but may be a bit slower because of the extra references to Profile (if performance becomes an issue just use a tool like AppSignal). The code logs all references to the legacy attributes, even ungreppable ones (e.g. user[attr] = ... or user.send("#{attr}=", ...)) so we’ll be able to locate all of them even when grep is unhelpful.

With this infrastructure in place, we can commit to fixing one reference to users.first_name on a regular schedule, e.g. every morning (to start the day with a quick win) or around noon (to work on something easier after a focused morning). This commitment is essential because our goal is to lessen mental barriers to fixing the issue. Leaving the code above in place without taking action will impoverish the app even further.

After removing all deprecated references (and confirming with grep and logs) we can finally drop users.first_name:

1
2
3
4
5
class RemoveUsersFirstName < ActiveRecord::Migration
  def change
    remove_column :users, :first_name, :string
  end
end

We should also get rid of the code added to User as it’s no longer necessary.

Limitations

The method might apply to your case but keep in mind some of its limitations:

You might be able to overcome some of them. For example, you might keep the models in sync with a service object or a callback on Profile. Or if you use PostgreSQL you might consider using a materialized view in the interim.

That’s it!

The most important lesson of the article is do not avoid code that smells but tackle it head on instead. If the task is overwhelming then work iteratively on a regular schedule. The article presented a method to consider when extracting a table is difficult. If you can’t apply it then look for something else. If you have no idea how then just drop me a line. I’ll try to help. Don’t let your bits rot.

10 latest articles

Go back

Subscribe to

Ruby Magic

Magicians never share their secrets. But we do. Sign up for our Ruby Magic email series and receive deep insights about garbage collection, memory allocation, concurrency and much more.