Using Supervisors to Organize Your Elixir Application

Ilya Ilya Averyanov on

In the previous chapter of this series, we looked at hot code reloading in Elixir and why we should use GenServer to implement long-running processes.

But to organize a whole application, we need one more building block — supervisors. Let’s take a look at supervisors in detail.

Defining an OTP Application

According to the documentation:

In OTP, application denotes a component implementing some specific functionality, that can be started and stopped as a unit, and that can be reused in other systems. This module interacts with application controller, a process started at every Erlang runtime system.

This module contains functions for controlling applications (for example, starting and stopping applications), and functions to access information about applications (for example, configuration parameters).

In other words, an application is a kind of package that contains reusable modules, has name, version, specific dependencies, etc.

Mix creates a new application when we run:

1
mix new our_new_app

But there is one crucial difference that distinguishes OTP applications from packages in other languages. OTP applications can be started and stopped and have their own running entities.

You can usually create such applications in Elixir with the command:

1
2
mix new our_new_app --sup
cd our_new_app

This creates an additional file for us: lib/our_new_app/application.ex. It implements the so-called application behavior. Its primary purpose is to implement start/2 function, which should start a supervision tree.

What Are Supervision Trees in Elixir?

So what is a supervision tree? I use the following analogy: a running OTP system is like a whole OS with its own lightweight processes. They start, work, and terminate. As in a real OS, we need a tool that helps us:

In a real OS, we have Systemd (on some Linux OSes) or launchd on MacOS. In OTP, there are supervisors and Supervisor module.

We can organize our processes in the following way using supervisors:

Supervision Tree

Leaf processes in this scheme are generally GenServer or similar processes.

As in Systemd, if a process fails, we can choose to do nothing. Another option is to restart the process over and over again until it completes normally, or together with sibling processes.

Earlier, in Erlang, it was tricky to build supervision trees, but Elixir helps us a lot with this.

It’s also worth noting that there is some good documentation available about supervisors.

Connecting GenServers to a Supervision Tree

Let’s again look at what mix created for us in lib/our_new_app/application.ex:

1
2
3
4
5
6
7
8
9
10
11
  def start(_type, _args) do
    children = [
      # Starts a worker by calling: OurNewApp.Worker.start_link(arg)
      # {OurNewApp.Worker, arg}
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: OurNewApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

It starts a supervisor and clearly shows how to run our worker as a child.

Let’s do that. We will be able to increment a given number periodically and report its state on demand in our sample process.

First, create a GenServer in lib/our_new_app/counter.ex:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
defmodule OurNewApp.Counter do
  use GenServer
  require Logger

  @interval 100

  def start_link(start_from, opts \\ []) do
    GenServer.start_link(__MODULE__, start_from, opts)
  end

  def get(pid) do
    GenServer.call(pid, :get)
  end

  def init(start_from) do
    st = %{
      current: start_from,
      timer: :erlang.start_timer(@interval, self(), :tick)
    }

    {:ok, st}
  end

  def handle_call(:get, _from, st) do
    {:reply, st.current, st}
  end

  def handle_info({:timeout, _timer_ref, :tick}, st) do
    new_timer = :erlang.start_timer(@interval, self(), :tick)
    :erlang.cancel_timer(st.timer)

    {:noreply, %{st | current: st.current + 1, timer: new_timer}}
  end
end

This server increments a given number every 100ms and can report its state via OurNewApp.Counter.get/1:

1
2
3
4
5
6
7
8
iex -S mix
...
iex(1)> {:ok, pid} = OurNewApp.Counter.start_link(10000)
{:ok, #PID<0.182.0>}
iex(2)> OurNewApp.Counter.get(pid)
10136
iex(3)> OurNewApp.Counter.get(pid)
10146

Now let’s integrate our server as a child. Update start/2 function in lib/our_new_app/application.ex to the following:

1
2
3
4
5
6
7
8
  def start(_type, _args) do
    children = [
      {OurNewApp.Counter, 10000}
    ]

    opts = [strategy: :one_for_one, name: OurNewApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

We see that our process starts automatically:

1
2
3
4
5
6
7
8
9
10
iex -S mix
...
iex(1)> [{_, pid, _, _}] = Supervisor.which_children(OurNewApp.Supervisor)
[{OurNewApp.Counter, #PID<0.141.0>, :worker, [OurNewApp.Counter]}]
iex(2)> OurNewApp.Counter.get(pid)
10119
iex(3)> Process.exit(pid, :shutdown)
true
iex(4)> Supervisor.which_children(OurNewApp.Supervisor)
[{OurNewApp.Counter, #PID<0.146.0>, :worker, [OurNewApp.Counter]}]

We queried the supervisor’s children with Supervisor.which_children/1. We also see that our counter process restarted after we stopped it.

Our process tree now looks like this:

Supervision Tree with Counter

Adding GenServers to Custom Supervisors

Now let’s make a special supervisor for our counter processes. Later, we’ll see why we may want to do that. Our supervision tree will look like this:

Supervision Tree with Counters and their own supervisor

First, we should make a callback module for our new special supervisor. Let’s add lib/our_new_app/counter_sup.ex with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
defmodule OurNewApp.CounterSup do
  use Supervisor

  def start_link(start_numbers) do
    Supervisor.start_link(__MODULE__, start_numbers, name: __MODULE__)
  end

  @impl true
  def init(start_numbers) do
    children =
      for start_number <- start_numbers do
        # We can't just use `{OurNewApp.Counter, start_number}`
        # because we need different ids for children

        Supervisor.child_spec({OurNewApp.Counter, start_number}, id: start_number)
      end

    Supervisor.init(children, strategy: :one_for_one)
  end
end

We must also update children for the main application supervisor in lib/our_new_app/application.ex:

1
2
3
4
5
6
7
8
  def start(_type, _args) do
    children = [
      {OurNewApp.CounterSup, [10000, 20000]}
    ]

    opts = [strategy: :one_for_one, name: OurNewApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

Let’s see what we get:

1
2
3
4
5
6
7
8
9
iex -S mix
...
iex(1)> Supervisor.which_children(OurNewApp.Supervisor)
[{OurNewApp.CounterSup, #PID<0.161.0>, :supervisor, [OurNewApp.CounterSup]}]
iex(2)> Supervisor.which_children(OurNewApp.CounterSup)
[
  {20000, #PID<0.163.0>, :worker, [OurNewApp.Counter]},
  {10000, #PID<0.162.0>, :worker, [OurNewApp.Counter]}
]

That’s just what we need: OurNewApp.Supervisor has OurNewApp.CounterSup as its child and OurNewApp.CounterSup has two OurNewApp.Counter children.

Many developers consider custom supervisors tricky and avoid using them. So let’s do some simple exercises to get more acquainted with them.

First, we’ll add a third counter to our counter supervisor at runtime:

1
2
3
4
5
6
7
8
9
10
11
iex(3)> new_child_spec = Supervisor.child_spec({OurNewApp.Counter, 30000}, id: 30000)
%{id: 30000, start: {OurNewApp.Counter, :start_link, [30000]}}
iex(4)> Supervisor.start_child(OurNewApp.CounterSup, new_child_spec)
{:ok, #PID<0.169.0>}
iex(5)> Supervisor.which_children(OurNewApp.CounterSup)
[
  {30000, #PID<0.169.0>, :worker, [OurNewApp.Counter]},
  {20000, #PID<0.163.0>, :worker, [OurNewApp.Counter]},
  {10000, #PID<0.162.0>, :worker, [OurNewApp.Counter]}
]

That was easy! With Supervisor.delete_child/2, Supervisor.restart_child/2, etc., we can easily manipulate the supervisor’s children.

Secondly, instead of adding one worker to the existing tree, let’s try adding a subtree with its own children (without a special module for the subtree supervisor):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
iex(6)> children_specs = for n <- [10000, 20000, 30000], do: Supervisor.child_spec({OurNewApp.Counter, n}, id: n)
[
  %{id: 10000, start: {OurNewApp.Counter, :start_link, [10000]}},
  %{id: 20000, start: {OurNewApp.Counter, :start_link, [20000]}},
  %{id: 30000, start: {OurNewApp.Counter, :start_link, [30000]}}
]
iex(7)> hand_crafted_sup_spec = %{
...(7)>     id: :hand_crafted_sup,
...(7)>     start: {Supervisor, :start_link, [children_specs, [strategy: :one_for_one]]},
...(7)>     type: :supervisor,
...(7)>     restart: :permanent,
...(7)>     shutdown: 5000
...(7)> }
...
iex(8)> Supervisor.start_child(OurNewApp.Supervisor, hand_crafted_sup_spec)
{:ok, #PID<0.204.0>}
iex(9)> Supervisor.which_children(OurNewApp.Supervisor)
[
  {:hand_crafted_sup, #PID<0.204.0>, :supervisor, [Supervisor]},
  {OurNewApp.CounterSup, #PID<0.161.0>, :supervisor, [OurNewApp.CounterSup]}
]

The following took place:

We could do this in another way: tell our main supervisor to launch an empty child supervisor, then add counters one by one to this child supervisor.

The process tree at the end of the experiment should look like this:

Supervision Tree with handcrafted supervisor

Examples of Custom Supervisor Usage

Let’s see what happens if we terminate our app.

First, add some logging to lib/our_new_app/counter.ex:

1
2
3
  def terminate(reason, st) do
    Logger.info("terminating with #{inspect(reason)}, counter is #{st.current}")
  end

Also enable the :trap_exit flag for our counters, so that we can handle process termination — see terminate callback documentation:

1
2
3
4
5
6
  def init(start_from) do
    Process.flag(:trap_exit, true)

    st = %{
    ...
  end

Now, if we stop our application in the iex session, we see:

1
2
3
4
5
6
7
iex -S mix
...
iex(1)> Application.stop(:our_new_app)
19:35:43.544 [info]  terminating with :shutdown, counter is 20049
19:35:43.548 [info]  terminating with :shutdown, counter is 10050
19:35:43.548 [info]  Application our_new_app exited: :stopped
:ok

Imagine that we have to implement a graceful shutdown. The condition of gracefulness is to count up until we reach numbers divisible by 10 (10, 20, 30, etc) before shutdown.

Of course, in our simple example, we may just send ticks to count to the nearest number divisible by 10 in terminate.

Instead, imagine that these events are external end emulate some metrics that we would prefer to aggregate consistently.

First, let’s add the possibility of a graceful restart to the OurNewApp.Counter module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
defmodule OurNewApp.Counter do
  use GenServer
  require Logger

  @interval 100

  def start_link(start_from) do
    GenServer.start_link(__MODULE__, start_from)
  end

  def get(pid) do
    GenServer.call(pid, :get)
  end

  def stop_gracefully(pid) do
    GenServer.call(pid, :stop_gracefully)
  end

  def init(start_from) do
    Process.flag(:trap_exit, true)

    st = %{
      current: start_from,
      timer: :erlang.start_timer(@interval, self(), :tick),
      terminator: nil
    }

    {:ok, st}
  end

  def handle_call(:get, _from, st) do
    {:reply, st.current, st}
  end

  def handle_call(:stop_gracefully, from, st) do
    if st.terminator do
      {:reply, :already_stopping, st}
    else
      {:noreply, %{st | terminator: from}}
    end
  end

  def handle_info({:timeout, _timer_ref, :tick}, st) do
    :erlang.cancel_timer(st.timer)

    new_current = st.current + 1

    if st.terminator && rem(new_current, 10) == 0 do
      # we are terminating
      GenServer.reply(st.terminator, :ok)
      {:stop, :normal, %{st | current: new_current, timer: nil}}
    else
      new_timer = :erlang.start_timer(@interval, self(), :tick)
      {:noreply, %{st | current: new_current, timer: new_timer}}
    end
  end

  def terminate(reason, st) do
    Logger.info("terminating with #{inspect(reason)}, counter is #{st.current}")
  end
end

Here we:

Let’s see how that works for a single process:

1
2
3
4
5
6
7
8
9
iex -S mix
Erlang/OTP 23 [erts-11.0.4] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [hipe]

iex(1)> {:ok, pid} = OurNewApp.Counter.start_link(10000)
{:ok, #PID<0.167.0>}
iex(2)> OurNewApp.Counter.stop_gracefully(pid)
:ok
iex(3)>
20:03:13.061 [info]  terminating with :normal, counter is 10120

Everything works fine. But what stops all counters gracefully? As we see in the OTP docs, OurNewApp.Application.prep_stop is called (if it exists) before the application stops.

Let’s add the desired functionality:

1
2
3
4
5
6
7
8
9
10
11
12
13
  @impl true
  def prep_stop(st) do
    stop_tasks =
      for {_, pid, _, _} <- Supervisor.which_children(OurNewApp.CounterSup) do
        Task.async(fn ->
          :ok = OurNewApp.Counter.stop_gracefully(pid)
        end)
      end

    Task.await_many(stop_tasks)

    st
  end

We also set :restart option to :transient in OurNewApp.CounterSup so that our counters do not restart after graceful shutdown:

1
2
3
4
Supervisor.child_spec({OurNewApp.Counter, {start_number, 200}},
  id: start_number,
  restart: :transient
)

Try to stop the app:

1
2
3
4
5
6
7
8
9
iex -S mix
iex(1)> Application.stop(:our_new_app)

20:24:02.958 [info]  terminating with :normal, counter is 10260

20:24:02.958 [info]  terminating with :normal, counter is 20260

20:24:02.962 [info]  Application our_new_app exited: :stopped
:ok

Now we only stop at numbers divisible by 10.

Using a special supervisor for our counters makes it possible to “find” all the instances quickly and operate with them. This is extremely important when applications go through start/stop stages.

Wrap-up

I hope you’ve managed to wrap your head around supervisors in Elixir, and have found this article helpful, alongside the previous article on hot code reloading.

There is another, even more complicated stage in an application life cycle: application code upgrades. We’ll leave that for the third and final part of this series.

Until then, enjoy coding!

P.S. If you’d like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

Our guest author Ilya is an Elixir/Erlang/Python developer and a tech leader at FunBox. His main occupation is bootstrapping new projects from both human and technological perspectives. Reach out via his Twitter for interesting discussions or consultancy.

5 favorite Elixir articles

10 latest Elixir articles

Go back
Elixir alchemy icon

Subscribe to

Elixir Alchemy

A true alchemist is never done exploring. And neither are we. Sign up for our Elixir Alchemy email series and receive deep insights about Elixir, Phoenix and other developments.

We'd like to set cookies, read why.