
One of the most exciting additions to Rails 8 is undoubtedly Solid Queue, a new library for processing background jobs.
You might not think it's that big of a deal. After all, there are plenty of other queuing systems out there. If you work with Rails, you'll likely know about Sidekiq and Resque — both are exceptionally performant and reliable. There is also GoodJob and the venerable DelayedJob. With all those options available, do we really need another queuing system?
Let's find out together. In this two-part series, we'll dig deep into Solid Queue's internals, discover what makes it unique, and learn more about why it was created in the first place.
Why Solid Queue for Ruby on Rails?
Since Rails 7, the team at 37Signals has been on a quest to reduce the operational overhead needed to launch a new Rails application. As part of this, they made SQLite the new default database for Rails apps - even in production. Furthermore, they started an effort to eliminate additional infrastructure dependencies to take full advantage of this new default.
37Signals had used Resque until then, and Resque requires Redis to function. So does Sidekiq, for that matter. To get rid of Redis, they had to create a queuing system that relies only on your database — and that queuing system turned out to be Solid Queue.
So that's its main selling point: No additional dependencies; just use your database. Very nice! However, as with any queuing system — and especially one that is the new Rails default — Solid Queue needs to satisfy some stringent requirements.
It must provide all the features Rails developers are used to from other background job systems. As the Rails default, it must support all databases that Rails works with. Obviously, it needs to satisfy standard safety requirements — as in, it must never, ever lose jobs. Last but not least, it must be fast enough to be a viable option for large production systems.
That's quite a tall order! So, how does Solid Queue address all those requirements?
Solid Queue From The Top
There are many details to consider, but let's start with a high-level architectural overview. You need to be aware of two significant components: Jobs and Workers.
Job
is an ActiveRecord
model, and what the user interacts with. Note that that's not
necessarily true for other ActiveJob
backends — it's just how SolidQueue implements background jobs. If you need to
create a new background job, this is the class that you inherit from. Job
also defines methods that enable you to
enqueue work, such as Job.perform_later
.
# app/jobs/my_job.rb class MyJob < ApplicationJob queue_as :default def perform # Do something later end end
Workers, as the name suggests, are the elements that perform the actual work. These are generally not directly created by the programmer but automatically created based on how you configure your application. For example, to have your application spawn two workers listening to all and two specific queues respectively, you'd use the following configuration file:
# config/queue.yml production: workers: - queues: "*" - queues: [default, critical]
Workers are spawned as processes, running in the background, waiting for jobs to be assigned to them. As you may have guessed, your database is the missing link between jobs and workers. Whenever Solid Queue does anything, one database table or other is involved. SolidQueue does a lot of things, so a lot of tables are needed.
# lib/generators/solid_queue/install/templates/db/queue_schema.rb ActiveRecord::Schema[7.1].define(version: 1) do create_table "solid_queue_jobs", force: :cascade do |t| # ... end create_table "solid_queue_ready_executions", force: :cascade do |t| # ... end create_table "solid_queue_scheduled_executions", force: :cascade do |t| # ... end create_table "solid_queue_claimed_executions", force: :cascade do |t| # ... end create_table "solid_queue_blocked_executions", force: :cascade do |t| # ... end create_table "solid_queue_failed_executions", force: :cascade do |t| #... end # Lots more tables below... end
The Life and Death of a SOLID Job
To understand what all those tables do and how they relate to the various features of Solid Queue, let's look at the
life cycle of a job. When a user enqueues a job to be executed later — let's say MyJob — a record is created in the
solid_queue_jobs
table. The record contains all the data required to execute the job — arguments, its name, the queue
it is put in, and so forth. If the job is enqueued to run as soon as possible (rather than scheduled to run at some
later point in time), an additional record is written to solid_queue_ready_executions
.
For example, running
MyJob.perform_later
results in the following SQL:
INSERT INTO "solid_queue_jobs" ("queue_name", "class_name", "arguments", "priority", "active_job_id", "scheduled_at", "finished_at", "concurrency_key", "created_at", "updated_at") VALUES ('default', 'MyJob', '{"job_class": "MyJob","...",}', 0, '...', '2024-12-01 14:00:00', NULL, NULL, '2024-12-01 14:00:00', '2024-12-01 14:00:00') RETURNING "id" INSERT INTO "solid_queue_ready_executions" ("job_id", "queue_name", "priority", "created_at") VALUES (1, 'default', 0, '2024-12-01 14:00:00') RETURNING "id"
Your workers poll this table for new records. A worker process that finds a new record will first claim it by writing
another record to the solid_queue_claimed_executions
table — we'll learn why that is necessary later. Only then will
the worker actually execute the job. Below is some heavily edited code to illustrate what is happening (much more is
happening in the actual code). If you are curious about the nitty-gritty details, I highly recommend you check out the original source code.
class Worker def run loop do break if shutting_down? unless poll > 0 # Polling interval is configurable and defaults to 1ms sleep(polling_interval) end end end def poll # Claim jobs and then execute claimed jobs. claim_executions.then do |executions| executions.each do |execution| # Actually execute the job end end end def claim_executions # Query the ready executions table and claim a job for execution. with_polling_volume do SolidQueue::ReadyExecution.claim end end end
Once a worker finishes a job, it removes the corresponding records from the solid_queue_jobs,
solid_queue_ready_executions
, and solid_queue_claimed_executions
tables. That's all there is to it — just polling
some tables, creating and removing records. Not so tricky, right? It would be if there weren't critical
non-functional requirements to consider, too.
On Performance
To achieve production-ready performance, Solid Queue uses ingenious database design. You may have wondered why
workers poll solid_queue_ready_executions
rather than solid_queue_jobs
. The additional table seems redundant at
first glance.
Consider that solid_queue_jobs
may contain thousands or millions of records, and querying that pile of data takes
time. In comparison, solid_queue_ready_executions
is tiny, as it only contains records for jobs that must be executed
right now! That leads to some serious speedup.
The introduction of additional tables also simplifies queries. Workers only use two different queries for polling. They either poll all queues or specific ones. That, in turn, allows for some nice covering indices.
SELECT job_id FROM solid_queue_ready_executions WHERE queue_name = "default" ORDER BY priority ASC, job_id ASC LIMIT 4 FOR UPDATE SKIP LOCKED
# Indices for polling solid queue ready executions create_table "solid_queue_ready_executions", force: :cascade do |t| t.index [ "priority", "job_id" ], name: "index_solid_queue_poll_all" t.index [ "queue_name", "priority", "job_id" ], name: "index_solid_queue_poll_by_queue" end
All that still wouldn't be enough to achieve truly outstanding performance. Traditionally, queuing systems that rely on polling tables have had a significant problem. One worker would block all others while querying and updating the polling table.
Let's take a look at why. Consider the following query:
SELECT id FROM jobs WHERE queue = "default" AND claimed = 0 ORDER_BY priority, id LIMIT 2 FOR UPDATE;
The FOR UPDATE
statement locks the rows selected by the query. This is necessary to avoid nasty race conditions, such as
multiple workers grabbing the same job. But that also means that any worker running this query would block read access
to the table. Thus, other workers would have to wait for that query to finish. The polling table becomes a bottleneck
that hinders rapid job execution.
Luckily, modern databases (PostgreSQL >= 9.5, MySQL >= 8.0) solve this problem. The SKIP LOCKED
statement allows the
database to lock only the records that are being updated. The rest of the table remains unlocked and free to be polled
concurrently.
SQLite does not support SKIP LOCKED
, so worker processes must queue up. In most cases, this shouldn't be an
issue. SQLite writes are fast as the database is present on disk. Even so, this is a limitation that you should
be aware of.
Whether you're using SQLite or another database, AppSignal provides Solid Queue performance monitoring out of the box! We'll talk more about this in part two of this series.
Safety First
We've spent some time discussing solid_queue_ready_executions
, but another table is instrumental for
ensuring that Solid Queue functions reliably. A key requirement of any queuing system is that any job being
enqueued is executed at least once. In other words, jobs must never be lost—we already alluded to this in the
introduction.
Without additional safety measures, this could quickly happen. Imagine that a worker starts working on a job and, in doing so, updates the corresponding job record to claim it. Of course, this is necessary to avoid multiple workers running a job simultaneously.
Imagine that suddenly, this worker process dies without finishing execution. Your machine might crash, and the OS may kill the worker for consuming too much memory — accidents happen, you know. The job it claimed will remain stuck forever because no other workers can grab it. Thus, it will never be executed, and your users will be sad and angry. The end.
That is, unless we add additional safety measures. Solid Queue solves this problem by introducing yet more tables —
solid_queue_claimed_executions
and solid_queue_processes
.
ActiveRecord::Schema[7.1].define(version: 1) do create_table "solid_queue_claimed_executions", force: :cascade do |t| t.bigint "job_id", null: false t.bigint "process_id" # ... end create_table "solid_queue_processes", force: :cascade do |t| t.datetime "last_heartbeat_at", null: false t.integer "pid", null: false # ... end # ... end
We've already mentioned solid_queue_claimed_executions
. Let's look at what happens when a worker claims
a job. For one, it sets the claimed
flag in the solid_queue_jobs
table. Additionally, a record is created in
solid_queue_claimed_executions
. This record contains the job_id
of the job being claimed and the id of the worker
process that makes the claim.
So, what is the solid_queue_processes
table good for? Any worker process will create and periodically update a record
in this table by setting last_heartbeat_at
. Of course, that alone wouldn't solve our problem.
We need another process to keep track of running processes: the so-called supervisor. This process runs in the background
and periodically checks solid_queue_processes
. A record with a last_heartbeat_at
older than a threshold — which
defaults to 5 minutes — indicates that the corresponding worker has met a tragic fate.
If such a record is found, the supervisor jumps into action. First, it removes the record from solid_queue_processes
.
Then, it marks any jobs previously claimed by the now-deceased worker as up-for-grabs. Thus, other workers can claim
them, avoiding the stuck-job situation.
More to Discover in Solid Queue
In this post, we covered a fair bit of Solid Queue's internals. We looked at its high-level architecture and how its most
essential feature—enqueuing and executing a job—works under the hood. We also learned the critical role of
FOR UPDATE SKIP LOCKED
in performance. Finally, we learned how the supervisor process helps avoid stuck
jobs.
But there is more to discover. Solid Queue offers many more features we haven't touched on, such as scheduling recurring and sequential jobs. Stay tuned as we continue our deep dive in part two of this series.
Wondering what you can do next?
Finished this article? Here are a few more things you can do:
- Subscribe to our Ruby Magic newsletter and never miss an article again.
- Start monitoring your Ruby app with AppSignal.
- Share this article on social media
Most popular Ruby articles
What's New in Ruby on Rails 8
Let's explore everything that Rails 8 has to offer.
See moreMeasuring the Impact of Feature Flags in Ruby on Rails with AppSignal
We'll set up feature flags in a Solidus storefront using Flipper and AppSignal's custom metrics.
See moreFive Things to Avoid in Ruby
We'll dive into five common Ruby mistakes and see how we can combat them.
See more

Hans-Jörg Schnedlitz
Our guest author Hans is a Rails engineer from Vienna, Austria. He spends most of his time coding or reading about coding, and sometimes even writes about it on his blog! When he's not sitting in front of a screen, you'll probably find him outside, climbing some mountain.
All articles by Hans-Jörg SchnedlitzBecome our next author!
AppSignal monitors your apps
AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!
