Erlang & Elixir are ready for asynchronous work right off the bat. Generally speaking, background job systems aren't needed as much as in other ecosystems but they still have their place for particular use cases.
This post goes through a few best practices I often try to think of in advance when writing background jobs, so that I don't hit some of the pain points that have hurt me multiple times in the past.
If you've ever deployed a new task, only to find out that it has gone rogue with a bug that caused it to misbehave (e.g.: sending way too many emails, way too quickly), you may have gone through similar bugs as well.
Flavours
Elixir already gives you the ability to schedule asynchronous work pretty easily. Something as simple as this already covers a lot:
You might need something a bit more powerful, either just for convenience (having some tooling & monitoring around that task), or because you need something like periodic jobs. Again, all this can be achieved with something like a GenServer:
You can also use a job queuing library such as Quantum
. If you come from Ruby land and are used to libraries such as
Sidekiq
, you might be more familiar with something like this:
Some may argue that since Erlang/OTP already provides the infrastructure for creating these processes, packages such
as Quantum
are not necessary. However, the structure created by them can end up being more intuitive, especially if you're
not that familiar with OTP. This might be the case with someone coming from Ruby or other such communities.
How to Structure Background Jobs
Let's now get into a few tips that will help you keep your jobs ready to deal with potential future problems!
Most of them are preventive measures due to the fact that all of these are background processes. They're not responding to an HTTP request and they happen without any intervention, thus sometimes, debugging can be hard if you don't take some precautions.
Let's consider a small example that sends confirmation emails to users that haven't received it yet:
1. Put in a Kill Switch
This is one of those mistakes I'll never make again since it has hurt me so many times.
Let's say you've created a background job, tested, deployed, and configured it to run periodically and send some emails.
It hits production, and you soon notice that something's wrong. The same 100 people are being spammed with emails every
minute. You messed up the geth_next_batch/1
function, and it always goes over the same batch of users.
It's a developer's horror story. You need to fix it (or kill it) quickly, but all that time waiting for a new release to get
online is physically painful.
So, avoid that:
You can plug in some persistent system that allows you to quickly toggle the job on/off. A good suggestion would be to use a feature flag package, such as FunWithFlags.
2. Always Batch Your Jobs by Default
It's easy to miss this one on a first draft. You're just trying to quickly get something online. But in some cases, it may be important to not hurt your performance if you're working on a very resource-intensive job, or simply, if your list of records to process grows too quickly.
Doing User |> where(confirmation_email_sent: false) |> Repo.all()
can be dangerous if there's potential for that to
yield too many results.
You may end up consuming too many resources for something that could be done in smaller batches, keeping your system
a lot more stable:
Whatever job queue mechanism you plug this worker into, it will end up being called frequently. So you shouldn't hurry in processing smaller batches one at a time.
3. Avoid Overlaps
This is kind of related to the previous point, but it's a concern that goes beyond performance.
If you program a job to run every minute, and a single execution has the potential to last longer than that, you end up risking cascading performance problems, or even worse, race conditions, where the first and second executions are both trying to process the same set of data, and conflict with each other in the process.
This is obviously dependent on what your exact business logic is, but as a general rule, it's best to be defensive here.
If you use a GenServer approach like the one showcased above, this is solved automatically, as instead of scheduling
jobs every minute, you can instead use Process.send_after(self(), :poll, delay)
to only schedule the next run after
the current one has finished, avoiding overlap.
When using Quantum
, you also have an overlap: true
option that you can add to automatically prevent this:
config :my_app, MyApp.Scheduler, jobs: [ first: [ # every hour schedule: "0 * * * *", task: {MyApp.ExampleJob, :run, []} overlap: false ]
This, by the way, might already be reason enough to consider using a package rather than just plain Elixir.
4. Plug in a Manual Mode
If your job is processing a batch of records, it's useful to plug in some public functions that allow you to manually process specific records. This can serve two purposes:
- Better ability to debug the job
- Ability to do a few manual runs before enabling the global job (by toggling the feature flag discussed above)
A sample structure could look like this:
In this case, we're creating a run_manually/1
public function that can receive either a single user or a batch of
them and performs the same logic as the automatic job would.
One important detail here is to again avoid a race condition, which in this case, is being done with a custom Lock
module that uses the redis_mutex
package to prevent potential issues:
The lock, which is invoked both on manual runs as well as the regular background job, ensures that you won't cause any unintentional conflicts if you try to do a manual run at the same time the job is doing the same processing. It also happens to solve the overlap problem discussed previously in this post.
Conclusion
All these tips come from something that I bumped into in the past, usually related to production bugs or user complaints. So I hope some of them help you avoid the same mistakes. Let me know if you have any further thoughts! 👋
P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!