elixir

Using Supervisors to Organize Your Elixir Application

Ilya Averyanov

Ilya Averyanov on

Using Supervisors to Organize Your Elixir Application

In the previous chapter of this series, we looked at hot code reloading in Elixir and why we should use GenServer to implement long-running processes.

But to organize a whole application, we need one more building block — supervisors. Let's take a look at supervisors in detail.

Defining an OTP Application

According to the documentation:

In OTP, application denotes a component implementing some specific functionality, that can be started and stopped as a unit, and that can be reused in other systems. This module interacts with application controller, a process started at every Erlang runtime system.

This module contains functions for controlling applications (for example, starting and stopping applications), and functions to access information about applications (for example, configuration parameters).

In other words, an application is a kind of package that contains reusable modules, has name, version, specific dependencies, etc.

Mix creates a new application when we run:

shell
mix new our_new_app

But there is one crucial difference that distinguishes OTP applications from packages in other languages. OTP applications can be started and stopped and have their own running entities.

You can usually create such applications in Elixir with the command:

shell
mix new our_new_app --sup cd our_new_app

This creates an additional file for us: lib/our_new_app/application.ex. It implements the so-called application behavior. Its primary purpose is to implement start/2 function, which should start a supervision tree.

What Are Supervision Trees in Elixir?

So what is a supervision tree? I use the following analogy: a running OTP system is like a whole OS with its own lightweight processes. They start, work, and terminate. As in a real OS, we need a tool that helps us:

  • Start the system in the correct order
  • Handle abnormal situations when a process dies due to some errors
  • Stop the system correctly.

In a real OS, we have Systemd (on some Linux OSes) or launchd on MacOS. In OTP, there are supervisors and Supervisor module.

We can organize our processes in the following way using supervisors:

Supervision Tree

Leaf processes in this scheme are generally GenServer or similar processes.

As in Systemd, if a process fails, we can choose to do nothing. Another option is to restart the process over and over again until it completes normally, or together with sibling processes.

Earlier, in Erlang, it was tricky to build supervision trees, but Elixir helps us a lot with this.

It's also worth noting that there is some good documentation available about supervisors.

Connecting GenServers to a Supervision Tree

Let's again look at what mix created for us in lib/our_new_app/application.ex:

elixir
def start(_type, _args) do children = [ # Starts a worker by calling: OurNewApp.Worker.start_link(arg) # {OurNewApp.Worker, arg} ] # See https://hexdocs.pm/elixir/Supervisor.html # for other strategies and supported options opts = [strategy: :one_for_one, name: OurNewApp.Supervisor] Supervisor.start_link(children, opts) end

It starts a supervisor and clearly shows how to run our worker as a child.

Let's do that. We will be able to increment a given number periodically and report its state on demand in our sample process.

First, create a GenServer in lib/our_new_app/counter.ex:

elixir
defmodule OurNewApp.Counter do use GenServer require Logger @interval 100 def start_link(start_from, opts \\ []) do GenServer.start_link(__MODULE__, start_from, opts) end def get(pid) do GenServer.call(pid, :get) end def init(start_from) do st = %{ current: start_from, timer: :erlang.start_timer(@interval, self(), :tick) } {:ok, st} end def handle_call(:get, _from, st) do {:reply, st.current, st} end def handle_info({:timeout, _timer_ref, :tick}, st) do new_timer = :erlang.start_timer(@interval, self(), :tick) :erlang.cancel_timer(st.timer) {:noreply, %{st | current: st.current + 1, timer: new_timer}} end end

This server increments a given number every 100ms and can report its state via OurNewApp.Counter.get/1:

shell
iex -S mix ... iex(1)> {:ok, pid} = OurNewApp.Counter.start_link(10000) {:ok, #PID<0.182.0>} iex(2)> OurNewApp.Counter.get(pid) 10136 iex(3)> OurNewApp.Counter.get(pid) 10146

Now let's integrate our server as a child. Update start/2 function in lib/our_new_app/application.ex to the following:

elixir
def start(_type, _args) do children = [ {OurNewApp.Counter, 10000} ] opts = [strategy: :one_for_one, name: OurNewApp.Supervisor] Supervisor.start_link(children, opts) end

We see that our process starts automatically:

shell
iex -S mix ... iex(1)> [{_, pid, _, _}] = Supervisor.which_children(OurNewApp.Supervisor) [{OurNewApp.Counter, #PID<0.141.0>, :worker, [OurNewApp.Counter]}] iex(2)> OurNewApp.Counter.get(pid) 10119 iex(3)> Process.exit(pid, :shutdown) true iex(4)> Supervisor.which_children(OurNewApp.Supervisor) [{OurNewApp.Counter, #PID<0.146.0>, :worker, [OurNewApp.Counter]}]

We queried the supervisor's children with Supervisor.which_children/1. We also see that our counter process restarted after we stopped it.

Our process tree now looks like this:

Supervision Tree with Counter

Adding GenServers to Custom Supervisors

Now let's make a special supervisor for our counter processes. Later, we'll see why we may want to do that. Our supervision tree will look like this:

Supervision Tree with Counters and their own supervisor

First, we should make a callback module for our new special supervisor. Let's add lib/our_new_app/counter_sup.ex with the following content:

elixir
defmodule OurNewApp.CounterSup do use Supervisor def start_link(start_numbers) do Supervisor.start_link(__MODULE__, start_numbers, name: __MODULE__) end @impl true def init(start_numbers) do children = for start_number <- start_numbers do # We can't just use `{OurNewApp.Counter, start_number}` # because we need different ids for children Supervisor.child_spec({OurNewApp.Counter, start_number}, id: start_number) end Supervisor.init(children, strategy: :one_for_one) end end

We must also update children for the main application supervisor in lib/our_new_app/application.ex:

elixir
def start(_type, _args) do children = [ {OurNewApp.CounterSup, [10000, 20000]} ] opts = [strategy: :one_for_one, name: OurNewApp.Supervisor] Supervisor.start_link(children, opts) end

Let's see what we get:

shell
iex -S mix ... iex(1)> Supervisor.which_children(OurNewApp.Supervisor) [{OurNewApp.CounterSup, #PID<0.161.0>, :supervisor, [OurNewApp.CounterSup]}] iex(2)> Supervisor.which_children(OurNewApp.CounterSup) [ {20000, #PID<0.163.0>, :worker, [OurNewApp.Counter]}, {10000, #PID<0.162.0>, :worker, [OurNewApp.Counter]} ]

That's just what we need: OurNewApp.Supervisor has OurNewApp.CounterSup as its child and OurNewApp.CounterSup has two OurNewApp.Counter children.

Many developers consider custom supervisors tricky and avoid using them. So let's do some simple exercises to get more acquainted with them.

First, we'll add a third counter to our counter supervisor at runtime:

shell
iex(3)> new_child_spec = Supervisor.child_spec({OurNewApp.Counter, 30000}, id: 30000) %{id: 30000, start: {OurNewApp.Counter, :start_link, [30000]}} iex(4)> Supervisor.start_child(OurNewApp.CounterSup, new_child_spec) {:ok, #PID<0.169.0>} iex(5)> Supervisor.which_children(OurNewApp.CounterSup) [ {30000, #PID<0.169.0>, :worker, [OurNewApp.Counter]}, {20000, #PID<0.163.0>, :worker, [OurNewApp.Counter]}, {10000, #PID<0.162.0>, :worker, [OurNewApp.Counter]} ]

That was easy! With Supervisor.delete_child/2, Supervisor.restart_child/2, etc., we can easily manipulate the supervisor's children.

Secondly, instead of adding one worker to the existing tree, let's try adding a subtree with its own children (without a special module for the subtree supervisor):

shell
iex(6)> children_specs = for n <- [10000, 20000, 30000], do: Supervisor.child_spec({OurNewApp.Counter, n}, id: n) [ %{id: 10000, start: {OurNewApp.Counter, :start_link, [10000]}}, %{id: 20000, start: {OurNewApp.Counter, :start_link, [20000]}}, %{id: 30000, start: {OurNewApp.Counter, :start_link, [30000]}} ] iex(7)> hand_crafted_sup_spec = %{ ...(7)> id: :hand_crafted_sup, ...(7)> start: {Supervisor, :start_link, [children_specs, [strategy: :one_for_one]]}, ...(7)> type: :supervisor, ...(7)> restart: :permanent, ...(7)> shutdown: 5000 ...(7)> } ... iex(8)> Supervisor.start_child(OurNewApp.Supervisor, hand_crafted_sup_spec) {:ok, #PID<0.204.0>} iex(9)> Supervisor.which_children(OurNewApp.Supervisor) [ {:hand_crafted_sup, #PID<0.204.0>, :supervisor, [Supervisor]}, {OurNewApp.CounterSup, #PID<0.161.0>, :supervisor, [OurNewApp.CounterSup]} ]

The following took place:

  • hand_crafted_sup_spec was constructed, which started Supervisor.start_link
  • We told our main supervisor to start a child with this spec
  • The main supervisor started with children_specs parameters
  • It started counters from children_specs.

We could do this in another way: tell our main supervisor to launch an empty child supervisor, then add counters one by one to this child supervisor.

The process tree at the end of the experiment should look like this:

Supervision Tree with handcrafted supervisor

Examples of Custom Supervisor Usage

Let's see what happens if we terminate our app.

First, add some logging to lib/our_new_app/counter.ex:

elixir
def terminate(reason, st) do Logger.info("terminating with #{inspect(reason)}, counter is #{st.current}") end

Also enable the :trap_exit flag for our counters, so that we can handle process termination — see terminate callback documentation:

elixir
def init(start_from) do Process.flag(:trap_exit, true) st = %{ ... end

Now, if we stop our application in the iex session, we see:

shell
iex -S mix ... iex(1)> Application.stop(:our_new_app) 19:35:43.544 [info] terminating with :shutdown, counter is 20049 19:35:43.548 [info] terminating with :shutdown, counter is 10050 19:35:43.548 [info] Application our_new_app exited: :stopped :ok

Imagine that we have to implement a graceful shutdown. The condition of gracefulness is to count up until we reach numbers divisible by 10 (10, 20, 30, etc) before shutdown.

Of course, in our simple example, we may just send ticks to count to the nearest number divisible by 10 in terminate.

Instead, imagine that these events are external end emulate some metrics that we would prefer to aggregate consistently.

First, let's add the possibility of a graceful restart to the OurNewApp.Counter module:

elixir
defmodule OurNewApp.Counter do use GenServer require Logger @interval 100 def start_link(start_from) do GenServer.start_link(__MODULE__, start_from) end def get(pid) do GenServer.call(pid, :get) end def stop_gracefully(pid) do GenServer.call(pid, :stop_gracefully) end def init(start_from) do Process.flag(:trap_exit, true) st = %{ current: start_from, timer: :erlang.start_timer(@interval, self(), :tick), terminator: nil } {:ok, st} end def handle_call(:get, _from, st) do {:reply, st.current, st} end def handle_call(:stop_gracefully, from, st) do if st.terminator do {:reply, :already_stopping, st} else {:noreply, %{st | terminator: from}} end end def handle_info({:timeout, _timer_ref, :tick}, st) do :erlang.cancel_timer(st.timer) new_current = st.current + 1 if st.terminator && rem(new_current, 10) == 0 do # we are terminating GenServer.reply(st.terminator, :ok) {:stop, :normal, %{st | current: new_current, timer: nil}} else new_timer = :erlang.start_timer(@interval, self(), :tick) {:noreply, %{st | current: new_current, timer: new_timer}} end end def terminate(reason, st) do Logger.info("terminating with #{inspect(reason)}, counter is #{st.current}") end end

Here we:

  • Add a terminator field to the state that keeps the address of the party that wants to stop the server
  • Set this field in the stop_gracefully handler
  • Continue counting until we get to a number divisible by 10
  • Respond to the terminating party and stop the server upon obtaining this number.

Let's see how that works for a single process:

shell
iex -S mix Erlang/OTP 23 [erts-11.0.4] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [hipe] iex(1)> {:ok, pid} = OurNewApp.Counter.start_link(10000) {:ok, #PID<0.167.0>} iex(2)> OurNewApp.Counter.stop_gracefully(pid) :ok iex(3)> 20:03:13.061 [info] terminating with :normal, counter is 10120

Everything works fine. But what stops all counters gracefully? As we see in the OTP docs, OurNewApp.Application.prep_stop is called (if it exists) before the application stops.

Let's add the desired functionality:

elixir
@impl true def prep_stop(st) do stop_tasks = for {_, pid, _, _} <- Supervisor.which_children(OurNewApp.CounterSup) do Task.async(fn -> :ok = OurNewApp.Counter.stop_gracefully(pid) end) end Task.await_many(stop_tasks) st end

We also set :restart option to :transient in OurNewApp.CounterSup so that our counters do not restart after graceful shutdown:

elixir
Supervisor.child_spec({OurNewApp.Counter, {start_number, 200}}, id: start_number, restart: :transient )

Try to stop the app:

shell
iex -S mix iex(1)> Application.stop(:our_new_app) 20:24:02.958 [info] terminating with :normal, counter is 10260 20:24:02.958 [info] terminating with :normal, counter is 20260 20:24:02.962 [info] Application our_new_app exited: :stopped :ok

Now we only stop at numbers divisible by 10.

Using a special supervisor for our counters makes it possible to "find" all the instances quickly and operate with them. This is extremely important when applications go through start/stop stages.

Wrap-up

I hope you've managed to wrap your head around supervisors in Elixir, and have found this article helpful, alongside the previous article on hot code reloading.

There is another, even more complicated stage in an application life cycle: application code upgrades. We'll leave that for the third and final part of this series.

Until then, enjoy coding!

P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

Ilya Averyanov

Ilya Averyanov

Our guest author Ilya is an Elixir/Erlang/Python developer and a tech leader at [FunBox](https://funbox.ru/). His main occupation is bootstrapping new projects from both human and technological perspectives. Feel free to reach out to him for interesting discussions or consultancy.

All articles by Ilya Averyanov

Become our next author!

Find out more

AppSignal monitors your apps

AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

Discover AppSignal
AppSignal monitors your apps