Web applications usually rely heavily on databases, for the most part. And as applications grow, databases grow too. We keep scaling web servers and background workers to keep up with the heavy load. But eventually, the database needs to keep up with all the new connections from these processes.
One way to tackle this is to grow a database with an app using vertical scaling. This means adding more CPU power and memory to the database server. But this is usually slow. You might have to copy all your data to the new server and then put the application down to change the database servers that it communicates with.
This is also usually a one-way operation. You can't keep adding/removing CPU power or memory from the database server based on your load.
This post will cover some alternative methods to fine-tune and scale a database under heavy load and improve the performance of your Rails app. We will focus on:
- Splitting schema and data between multiple databases
- Using read-only replica databases
Let's get going!
Disclaimer: Scaling Isn't Always Needed
As always, with any post about performance optimization and scaling, I would like to put up a standard disclaimer: make sure you have an issue before trying to solve it.
For example, on one of our apps in production, we tackled scaling and optimization only when we started processing more than 3 million background jobs a day.
This will, of course, be different for different apps. But usually, a good indicator of whether you need to optimize is if your database is always running at high CPU or memory usage.
A Separate Database for Your Rails Application
Sometimes, simply using a separate database server for a part of your app is a good solution. For example, your app might do two different things that don’t have a big overlap. Alternatively, maybe there's one very database-heavy feature, but it is only used rarely or by a small section of users. You should go for a separate database server if — and only if — there is a clear distinction between the parts of your app that will use different databases.
For example, a separate database is useful when you're audit logging in a high-frequency app (or have other very high-volume data that isn't necessarily accessed frequently). Let's see how we can set this up.
First, set up the database.yml
to configure the second database. Let’s call the second database log
. We'll add a log_default
entry for the secondary database with a common configuration across all environments:
log_default: &log_default host: localhost port: <%= ENV["DB_PORT"] || 5432 %> adapter: postgresql encoding: unicode user: <%= ENV["DB_USER"] || "postgres" %> password: <%= ENV["DB_PASSWORD"] || "postgres" %> migrations_paths: db/log_migrate
The important bit here is the migrations_paths
option that tells Rails where to find migrations related to this database.
Then, update database.yml
to configure access to the log database for development
, test
, and production
environments:
development: primary: <<: *default database: <%= ENV["DB_NAME"] %> log: <<: *log_default database: <%= ENV["LOG_DB_NAME"] %> test: primary: <<: *default database: my_app_test log: <<: *log_default database: my_app_log_test production: primary: adapter: postgresql encoding: unicode url: <%= ENV["DATABASE_URL"] %> log: adapter: postgresql encoding: unicode migrations_paths: db/log_migrate url: <%= ENV["DATABASE_LOG_URL"] %>
As you can see, we move the configuration for each environment one level deeper. The configuration for our primary database lives inside the primary
key for each environment. The second database's configuration (that we call log
) lives inside the log
key. primary
is a special keyword that tells Rails to use this database as the default.
The next step is to set up ActiveRecord
to use this new database. We first create a base class that establishes a connection to this database (instead of a primary one) using establish_connection
:
class LogRecord < ActiveRecord::Base self.abstract_class = true establish_connection :log end
And then, we inherit from LogRecord
instead of ApplicationRecord
for all the records that should access the log
DB:
class AuditLog < LogRecord end
Note that you keep using the rails generators for generating models and migrations by appending the --database log
option to target the second database. The migration task now automatically migrates both databases, but if you need to only migrate one, you can use db:migrate:primary
or db:migrate:log
tasks.
This works great if there is a clear distinction between two parts of the app. But what if you don’t have a clear idea of the database that's creating issues? There are still a couple of options left. Let's check them out.
Using Read-Only Replicas for Rails
Our second scaling option is to access data from read-only replicas while still writing to the primary database. This can improve performance for users who only browse parts of the app without performing any writing operations. Depending on your app's use case, this can be 80% of your users or 10% of them. So, evaluate the behavior of your users or your application use case before going down this route.
For an app with a majority of read-only users (like Twitter, for example), you can gain huge improvements in performance. New read-only replicas can be added/removed at will without affecting the primary database, which opens up options for auto-scalability.
Let's see how to set this up.
Setting up a Read-Only Replica
As usual, we will start with modifications to the database.yml
config to include the replica:
production: primary: adapter: postgresql encoding: unicode url: <%= ENV["DATABASE_URL"] %> primary_replica: adapter: postgresql encoding: unicode url: <%= ENV["DATABASE_REPLICA_URL"] %>
Again, the primary
key is a special key indicating that this database is the default. We can use any other key for the replica database configuration. Let’s choose primary_replica
for sanity.
Then update the ApplicationRecord
to configure a connection to multiple databases:
class ApplicationRecord < ActiveRecord::Base self.abstract_class = true connects_to database: { writing: :primary, reading: :primary_replica } if Rails.env.production? end
Note that we add configuration for a production
environment only in the above example. If you want to, you can set it up for all environments, after modifying your development setup to support replicas.
Finally, to use the read-only replica, you need to tell Rails when to use the primary database and when to use the replica. Rails provides a basic implementation for automatic role-switching based on the HTTP verb of the request out of the box. To enable it, add the following to your production.rb
:
# Use Read-Only Databases on GET requests config.active_record.database_selector = { delay: 2.seconds } config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver config.active_record.database_resolver_context = ActiveRecord::Middleware::DatabaseSelector::Resolver::Session
This tells Rails to use the replica for all GET
or HEAD
requests and the primary one otherwise. Notice the option that sets the delay to two seconds? That tells Rails to keep using the primary database for GET
and HEAD
requests if it is within two seconds of a write request in the same session.
Gotchas
Does all of this sound too simple? Well, we aren’t done yet. Depending on how your application is structured, you need to keep a few points in mind.
Firstly, ensure you do not write anything in a read-only action like index
or show
. If there are legitimate reasons for writing during those operations, you can manually switch the connection:
ActiveRecord::Base.connected_to(role: :writing) do # Your code here end
A good example where this can be useful is if you are managing sessions in a database. If you do that, make sure you use the writing
role. For example, you can override the save_record
method in your session with authlogic
:
class UserSession < Authlogic::Session::Base # Your session configuration def save_record(alternate_record = nil) ActiveRecord::Base.connected_to(role: :writing) do super end end end
Another use case for automatic role-switching is when an app exposes a GraphQL API. All operations on a GraphQL API are POST
, but conventionally, all queries
are read-only, and mutations
are read-write. In this case, to allow Rails to select the correct database, we can use a custom tracer in the schema.
First, create a tracer that switches roles based on the GraphQL document:
module Tracers class DatabaseRoleTracer EVENT_NAME = "execute_multiplex" # Execute on read only database if the operation has only queries. def trace(event, data) return yield unless Rails.env.production? return yield unless event == EVENT_NAME multiplex = data[:multiplex] ActiveRecord::Base.connected_to(role: role(multiplex)) do yield end end private def role(multiplex) if multiplex.queries.all?(&:query?) ActiveRecord::Base.reading_role else ActiveRecord::Base.writing_role end end end end
And then use the tracer
in your GraphQL Schema:
class MyAppSchema < GraphQL::Schema tracer Tracers::DatabaseRoleTracer.new # Your code here end
Let’s get to the final alternative. What can you do if your application still has a huge load on your primary database? The answer: database sharding.
Database Sharding in Your Rails Application
This is the most complex of all the methods for scaling databases discussed in this post. So reach for it only if you need it. It will add considerable complexity to your application and needs to be done right to provide any real benefit.
There are two strategies to shard your database:
- Distribute different tables on different nodes (vertical sharding).
- Have the same schema across all nodes, but distribute data depending on certain parameters (horizontal sharding).
Vertical sharding is similar to our “multiple databases” strategy in terms of setup and pros and cons, so we will not discuss it again.
Let's discuss horizontal sharding and where it could be helpful.
One example is when you have a SaaS platform where a lot of data is associated with each user, but there is no overlap between the data of two users. In this case, splitting the data across nodes based on the user's id is the most logical way to go. Every signed-in user will have to access only a single database, so we won't have to reach out to multiple databases at the same time.
Setting up Horizontal Sharding
Let’s start with the configuration inside database.yml
again:
production: primary: adapter: postgresql encoding: unicode url: <%= ENV["DATABASE_URL"] %> primary_replica: adapter: postgresql encoding: unicode url: <%= ENV["DATABASE_REPLICA_URL"] %> primary_shard_one: adapter: postgresql encoding: unicode url: <%= ENV["DATABASE_SHARD_ONE_URL"] %> primary_shard_one_replica: adapter: postgresql encoding: unicode url: <%= ENV["DATABASE_SHARD_ONE_REPLICA_URL"] %>
Then modify the ApplicationRecord
to connect to different shards:
class ApplicationRecord < ActiveRecord::Base self.abstract_class = true connects_to shards: { default: { writing: :primary, reading: :primary_replica }, shard_one: { writing: :primary_shard_one, reading: :primary_shard_one_replica } } end
Now we can use the shard
option with ActiveRecord::Base.connected_to
to switch shards. Like the automatic database role selector, Rails also provides an automatic shard switcher that can be activated inside production.rb
like this:
Rails.application.configure do config.active_record.shard_selector = { lock: true } config.active_record.shard_resolver = ->(request) { Tenant.find_by!(host: request.host).shard } end
The shard_resolver
lambda is the most interesting part. The above implementation relies on the assumption that our application is accessed from different domains/subdomains and distinguishes between the shards. Modify it to fit your application needs (you might need to access cookies to identify a user before switching the shard).
As with vertical scaling, this strategy is usually a one-way street. Once you shard a database, it is very hard to “un-shard” it (since several databases could have the same ids for different objects). But this lays down the foundation to really scale your app to millions or billions of users when fitting everything on the same server is not an option.
If you want to read more about database sharding, see this write-up by Linode.
Measure Your Rails Database Performance with AppSignal
It can be tricky to keep an eye on the performance of your database without any other tools. Using AppSignal, you can easily track how your databases perform. See our AppSignal for Ruby page for more information.
You can also check out our post Test and Optimize Your Ruby on Rails Database Performance for more information on the metrics you can measure in AppSignal and how a dashboard might look.
Wrap Up
In this post, we discussed three major strategies to scale your database servers horizontally:
- Splitting schema between multiple databases
- Using read-only replica databases
- Splitting data between multiple databases
All strategies have their pros and cons depending on your application use case and scale.
Rails has really upped its database game in its last few releases. This post has shown how easy it is to set up a Rails app to use multiple databases.
As always, if your application allows it, I suggest starting small with optimizations. The easiest optimization to start with is a read-only replica, then only move on when you need more scale.
Happy scaling!
P.S. If you'd like to read Ruby Magic posts as soon as they get off the press, subscribe to our Ruby Magic newsletter and never miss a single post!