elixir

Best Practices for Background Jobs in Elixir

Miguel Palhas

Miguel Palhas on

Best Practices for Background Jobs in Elixir

Erlang & Elixir are ready for asynchronous work right off the bat. Generally speaking, background job systems aren't needed as much as in other ecosystems but they still have their place for particular use cases.

This post goes through a few best practices I often try to think of in advance when writing background jobs, so that I don't hit some of the pain points that have hurt me multiple times in the past.

If you've ever deployed a new task, only to find out that it has gone rogue with a bug that caused it to misbehave (e.g.: sending way too many emails, way too quickly), you may have gone through similar bugs as well.

Flavours

Elixir already gives you the ability to schedule asynchronous work pretty easily. Something as simple as this already covers a lot:

elixir
Task.async(fn -> # some heavy lifting end)

You might need something a bit more powerful, either just for convenience (having some tooling & monitoring around that task), or because you need something like periodic jobs. Again, all this can be achieved with something like a GenServer:

elixir
defmodule PeriodicJob do use GenServer @period 60_000 def init do Process.send_after(self(), :poll, @period) {:ok, :state} end def handle_info(:poll, state) do # some heavy lifting Process.send_after(self(), :poll, @period) end end

You can also use a job queuing library such as Quantum. If you come from Ruby land and are used to libraries such as Sidekiq, you might be more familiar with something like this:

elixir
# # lib/my_app/scheduler.ex # defmodule MyApp.Scheduler do use Quantum.Scheduler, opt_app: :my_app end # # config/config.exs # config :my_app, MyApp.Scheduler, jobs: [ first: [ # every hour schedule: "0 * * * *", task: {MyApp.ExampleJob, :run, []} ], second: [ # every minute schedule: "* * * * *", task: {MyApp.AnotherExampleJob, :run, []} ] ]

Some may argue that since Erlang/OTP already provides the infrastructure for creating these processes, packages such as Quantum are not necessary. However, the structure created by them can end up being more intuitive, especially if you're not that familiar with OTP. This might be the case with someone coming from Ruby or other such communities.

How to Structure Background Jobs

Let's now get into a few tips that will help you keep your jobs ready to deal with potential future problems!

Most of them are preventive measures due to the fact that all of these are background processes. They're not responding to an HTTP request and they happen without any intervention, thus sometimes, debugging can be hard if you don't take some precautions.

Let's consider a small example that sends confirmation emails to users that haven't received it yet:

elixir
defmodule MyApp.ExampleJob do def run do get_users() |> Enum.each(fn user -> # send single email to user end) end defp get_users do MyApp.User |> where(confirmation_email_sent: false) |> MyApp.Repo.all() end end

1. Put in a Kill Switch

This is one of those mistakes I'll never make again since it has hurt me so many times.

Let's say you've created a background job, tested, deployed, and configured it to run periodically and send some emails.

It hits production, and you soon notice that something's wrong. The same 100 people are being spammed with emails every minute. You messed up the geth_next_batch/1 function, and it always goes over the same batch of users. It's a developer's horror story. You need to fix it (or kill it) quickly, but all that time waiting for a new release to get online is physically painful.

So, avoid that:

elixir
defmodule MyApp.ExampleJob do def run do return if !enabled? # ... end defp enabled? do # check a Redis flag, or a database record, or anything really end end

You can plug in some persistent system that allows you to quickly toggle the job on/off. A good suggestion would be to use a feature flag package, such as FunWithFlags.

2. Always Batch Your Jobs by Default

It's easy to miss this one on a first draft. You're just trying to quickly get something online. But in some cases, it may be important to not hurt your performance if you're working on a very resource-intensive job, or simply, if your list of records to process grows too quickly.

Doing User |> where(confirmation_email_sent: false) |> Repo.all() can be dangerous if there's potential for that to yield too many results. You may end up consuming too many resources for something that could be done in smaller batches, keeping your system a lot more stable:

elixir
defmodule MyApp.ExampleJob do @batch_size 100 defp get_users do MyApp.User |> where(confirmation_email_sent: false) |> limit(^@batch_size) |> MyApp.Repo.all() end end

Whatever job queue mechanism you plug this worker into, it will end up being called frequently. So you shouldn't hurry in processing smaller batches one at a time.

3. Avoid Overlaps

This is kind of related to the previous point, but it's a concern that goes beyond performance.

If you program a job to run every minute, and a single execution has the potential to last longer than that, you end up risking cascading performance problems, or even worse, race conditions, where the first and second executions are both trying to process the same set of data, and conflict with each other in the process.

This is obviously dependent on what your exact business logic is, but as a general rule, it's best to be defensive here.

If you use a GenServer approach like the one showcased above, this is solved automatically, as instead of scheduling jobs every minute, you can instead use Process.send_after(self(), :poll, delay) to only schedule the next run after the current one has finished, avoiding overlap.

When using Quantum, you also have an overlap: true option that you can add to automatically prevent this:

shell
config :my_app, MyApp.Scheduler, jobs: [ first: [ # every hour schedule: "0 * * * *", task: {MyApp.ExampleJob, :run, []} overlap: false ]

This, by the way, might already be reason enough to consider using a package rather than just plain Elixir.

4. Plug in a Manual Mode

If your job is processing a batch of records, it's useful to plug in some public functions that allow you to manually process specific records. This can serve two purposes:

  • Better ability to debug the job
  • Ability to do a few manual runs before enabling the global job (by toggling the feature flag discussed above)

A sample structure could look like this:

elixir
defmodule MyApp.ExampleJob do import MyApp.Lock def run do lock("example_job", fn -> get_users() |> Enum.each(&process_user/1) end) end def run_manually(users) when is_list(users) do lock("example_job", fn -> users |> Enum.each(&process_user/1) end) end def run_manually(user), do: run_manually([user]) def process_user(%User{} = user) do # process a single user end end

In this case, we're creating a run_manually/1 public function that can receive either a single user or a batch of them and performs the same logic as the automatic job would.

One important detail here is to again avoid a race condition, which in this case, is being done with a custom Lock module that uses the redis_mutex package to prevent potential issues:

elixir
defmodule MyApp.Lock do use RedisMutex require Logger def lock(lock_name, fun) do with_lock(lock_name, 60_000) do fun.() end rescue _e in RedisMutex.Error -> Logger.debug("#{locker_name} another process already running") end end

The lock, which is invoked both on manual runs as well as the regular background job, ensures that you won't cause any unintentional conflicts if you try to do a manual run at the same time the job is doing the same processing. It also happens to solve the overlap problem discussed previously in this post.

Conclusion

All these tips come from something that I bumped into in the past, usually related to production bugs or user complaints. So I hope some of them help you avoid the same mistakes. Let me know if you have any further thoughts! 👋

P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

Miguel Palhas

Miguel Palhas

Guest author Miguel is a professional over-engineer at Portuguese-based Subvisual. He works mostly with Ruby, Elixir, DevOps, and Rust. He likes building fancy keyboards and playing excessive amounts of online chess.

All articles by Miguel Palhas

Become our next author!

Find out more

AppSignal monitors your apps

AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

Discover AppSignal
AppSignal monitors your apps