In the previous chapter of this series, we looked at hot code reloading in Elixir and why we should use GenServer
to implement long-running processes.
But to organize a whole application, we need one more building block — supervisors. Let's take a look at supervisors in detail.
Defining an OTP Application
According to the documentation:
In OTP, application denotes a component implementing some specific functionality, that can be started and stopped as a unit, and that can be reused in other systems. This module interacts with application controller, a process started at every Erlang runtime system.
This module contains functions for controlling applications (for example, starting and stopping applications), and functions to access information about applications (for example, configuration parameters).
In other words, an application is a kind of package that contains reusable modules, has name, version, specific dependencies, etc.
Mix creates a new application when we run:
mix new our_new_app
But there is one crucial difference that distinguishes OTP applications from packages in other languages. OTP applications can be started and stopped and have their own running entities.
You can usually create such applications in Elixir with the command:
mix new our_new_app --sup cd our_new_app
This creates an additional
file for us: lib/our_new_app/application.ex
.
It implements the so-called application behavior
.
Its primary purpose is to implement start/2
function, which should start a supervision tree
.
What Are Supervision Trees in Elixir?
So what is a supervision tree? I use the following analogy: a running OTP system is like a whole OS with its own lightweight processes. They start, work, and terminate. As in a real OS, we need a tool that helps us:
- Start the system in the correct order
- Handle abnormal situations when a process dies due to some errors
- Stop the system correctly.
In a real OS, we have Systemd (on some Linux OSes) or launchd on MacOS. In OTP,
there are supervisors and Supervisor
module.
We can organize our processes in the following way using supervisors:
Leaf processes in this scheme are generally GenServer
or similar processes.
As in Systemd, if a process fails, we can choose to do nothing. Another option is to restart the process over and over again until it completes normally, or together with sibling processes.
Earlier, in Erlang, it was tricky to build supervision trees, but Elixir helps us a lot with this.
It's also worth noting that there is some good documentation available about supervisors.
Connecting GenServers to a Supervision Tree
Let's again look at what mix created for us in lib/our_new_app/application.ex
:
def start(_type, _args) do children = [ # Starts a worker by calling: OurNewApp.Worker.start_link(arg) # {OurNewApp.Worker, arg} ] # See https://hexdocs.pm/elixir/Supervisor.html # for other strategies and supported options opts = [strategy: :one_for_one, name: OurNewApp.Supervisor] Supervisor.start_link(children, opts) end
It starts a supervisor and clearly shows how to run our worker as a child.
Let's do that. We will be able to increment a given number periodically and report its state on demand in our sample process.
First, create a GenServer
in lib/our_new_app/counter.ex
:
defmodule OurNewApp.Counter do use GenServer require Logger @interval 100 def start_link(start_from, opts \\ []) do GenServer.start_link(__MODULE__, start_from, opts) end def get(pid) do GenServer.call(pid, :get) end def init(start_from) do st = %{ current: start_from, timer: :erlang.start_timer(@interval, self(), :tick) } {:ok, st} end def handle_call(:get, _from, st) do {:reply, st.current, st} end def handle_info({:timeout, _timer_ref, :tick}, st) do new_timer = :erlang.start_timer(@interval, self(), :tick) :erlang.cancel_timer(st.timer) {:noreply, %{st | current: st.current + 1, timer: new_timer}} end end
This server increments a given number every 100ms and can report its state via OurNewApp.Counter.get/1
:
iex -S mix ... iex(1)> {:ok, pid} = OurNewApp.Counter.start_link(10000) {:ok, #PID<0.182.0>} iex(2)> OurNewApp.Counter.get(pid) 10136 iex(3)> OurNewApp.Counter.get(pid) 10146
Now let's integrate our server as a child. Update start/2
function in lib/our_new_app/application.ex
to the following:
def start(_type, _args) do children = [ {OurNewApp.Counter, 10000} ] opts = [strategy: :one_for_one, name: OurNewApp.Supervisor] Supervisor.start_link(children, opts) end
We see that our process starts automatically:
iex -S mix ... iex(1)> [{_, pid, _, _}] = Supervisor.which_children(OurNewApp.Supervisor) [{OurNewApp.Counter, #PID<0.141.0>, :worker, [OurNewApp.Counter]}] iex(2)> OurNewApp.Counter.get(pid) 10119 iex(3)> Process.exit(pid, :shutdown) true iex(4)> Supervisor.which_children(OurNewApp.Supervisor) [{OurNewApp.Counter, #PID<0.146.0>, :worker, [OurNewApp.Counter]}]
We queried the supervisor's children with Supervisor.which_children/1
. We also see
that our counter process restarted after we stopped it.
Our process tree now looks like this:
Adding GenServers to Custom Supervisors
Now let's make a special supervisor for our counter processes. Later, we'll see why we may want to do that. Our supervision tree will look like this:
First, we should make a callback module for our new special supervisor. Let's add lib/our_new_app/counter_sup.ex
with the following content:
defmodule OurNewApp.CounterSup do use Supervisor def start_link(start_numbers) do Supervisor.start_link(__MODULE__, start_numbers, name: __MODULE__) end @impl true def init(start_numbers) do children = for start_number <- start_numbers do # We can't just use `{OurNewApp.Counter, start_number}` # because we need different ids for children Supervisor.child_spec({OurNewApp.Counter, start_number}, id: start_number) end Supervisor.init(children, strategy: :one_for_one) end end
We must also update children for the main application supervisor in lib/our_new_app/application.ex
:
def start(_type, _args) do children = [ {OurNewApp.CounterSup, [10000, 20000]} ] opts = [strategy: :one_for_one, name: OurNewApp.Supervisor] Supervisor.start_link(children, opts) end
Let's see what we get:
iex -S mix ... iex(1)> Supervisor.which_children(OurNewApp.Supervisor) [{OurNewApp.CounterSup, #PID<0.161.0>, :supervisor, [OurNewApp.CounterSup]}] iex(2)> Supervisor.which_children(OurNewApp.CounterSup) [ {20000, #PID<0.163.0>, :worker, [OurNewApp.Counter]}, {10000, #PID<0.162.0>, :worker, [OurNewApp.Counter]} ]
That's just what we need: OurNewApp.Supervisor
has OurNewApp.CounterSup
as its child
and OurNewApp.CounterSup
has two OurNewApp.Counter
children.
Many developers consider custom supervisors tricky and avoid using them. So let's do some simple exercises to get more acquainted with them.
First, we'll add a third counter to our counter supervisor at runtime:
iex(3)> new_child_spec = Supervisor.child_spec({OurNewApp.Counter, 30000}, id: 30000) %{id: 30000, start: {OurNewApp.Counter, :start_link, [30000]}} iex(4)> Supervisor.start_child(OurNewApp.CounterSup, new_child_spec) {:ok, #PID<0.169.0>} iex(5)> Supervisor.which_children(OurNewApp.CounterSup) [ {30000, #PID<0.169.0>, :worker, [OurNewApp.Counter]}, {20000, #PID<0.163.0>, :worker, [OurNewApp.Counter]}, {10000, #PID<0.162.0>, :worker, [OurNewApp.Counter]} ]
That was easy! With Supervisor.delete_child/2
, Supervisor.restart_child/2
, etc.,
we can easily manipulate the supervisor's children.
Secondly, instead of adding one worker to the existing tree, let's try adding a subtree with its own children (without a special module for the subtree supervisor):
iex(6)> children_specs = for n <- [10000, 20000, 30000], do: Supervisor.child_spec({OurNewApp.Counter, n}, id: n) [ %{id: 10000, start: {OurNewApp.Counter, :start_link, [10000]}}, %{id: 20000, start: {OurNewApp.Counter, :start_link, [20000]}}, %{id: 30000, start: {OurNewApp.Counter, :start_link, [30000]}} ] iex(7)> hand_crafted_sup_spec = %{ ...(7)> id: :hand_crafted_sup, ...(7)> start: {Supervisor, :start_link, [children_specs, [strategy: :one_for_one]]}, ...(7)> type: :supervisor, ...(7)> restart: :permanent, ...(7)> shutdown: 5000 ...(7)> } ... iex(8)> Supervisor.start_child(OurNewApp.Supervisor, hand_crafted_sup_spec) {:ok, #PID<0.204.0>} iex(9)> Supervisor.which_children(OurNewApp.Supervisor) [ {:hand_crafted_sup, #PID<0.204.0>, :supervisor, [Supervisor]}, {OurNewApp.CounterSup, #PID<0.161.0>, :supervisor, [OurNewApp.CounterSup]} ]
The following took place:
hand_crafted_sup_spec
was constructed, which startedSupervisor.start_link
- We told our main supervisor to start a child with this spec
- The main supervisor started with
children_specs
parameters - It started counters from
children_specs
.
We could do this in another way: tell our main supervisor to launch an empty child supervisor, then add counters one by one to this child supervisor.
The process tree at the end of the experiment should look like this:
Examples of Custom Supervisor Usage
Let's see what happens if we terminate our app.
First, add some logging to lib/our_new_app/counter.ex
:
def terminate(reason, st) do Logger.info("terminating with #{inspect(reason)}, counter is #{st.current}") end
Also enable the :trap_exit
flag for our counters, so that we can handle
process termination — see terminate
callback documentation:
def init(start_from) do Process.flag(:trap_exit, true) st = %{ ... end
Now, if we stop our application in the iex session, we see:
iex -S mix ... iex(1)> Application.stop(:our_new_app) 19:35:43.544 [info] terminating with :shutdown, counter is 20049 19:35:43.548 [info] terminating with :shutdown, counter is 10050 19:35:43.548 [info] Application our_new_app exited: :stopped :ok
Imagine that we have to implement a graceful shutdown. The condition of gracefulness is to count up until we reach numbers divisible by 10 (10, 20, 30, etc) before shutdown.
Of course, in our simple example, we may just send ticks to count to the nearest
number divisible by 10 in terminate
.
Instead, imagine that these events are external end emulate some metrics that we would prefer to aggregate consistently.
First, let's add the possibility of a graceful restart to the OurNewApp.Counter
module:
defmodule OurNewApp.Counter do use GenServer require Logger @interval 100 def start_link(start_from) do GenServer.start_link(__MODULE__, start_from) end def get(pid) do GenServer.call(pid, :get) end def stop_gracefully(pid) do GenServer.call(pid, :stop_gracefully) end def init(start_from) do Process.flag(:trap_exit, true) st = %{ current: start_from, timer: :erlang.start_timer(@interval, self(), :tick), terminator: nil } {:ok, st} end def handle_call(:get, _from, st) do {:reply, st.current, st} end def handle_call(:stop_gracefully, from, st) do if st.terminator do {:reply, :already_stopping, st} else {:noreply, %{st | terminator: from}} end end def handle_info({:timeout, _timer_ref, :tick}, st) do :erlang.cancel_timer(st.timer) new_current = st.current + 1 if st.terminator && rem(new_current, 10) == 0 do # we are terminating GenServer.reply(st.terminator, :ok) {:stop, :normal, %{st | current: new_current, timer: nil}} else new_timer = :erlang.start_timer(@interval, self(), :tick) {:noreply, %{st | current: new_current, timer: new_timer}} end end def terminate(reason, st) do Logger.info("terminating with #{inspect(reason)}, counter is #{st.current}") end end
Here we:
- Add a
terminator
field to the state that keeps the address of the party that wants to stop the server - Set this field in the
stop_gracefully
handler - Continue counting until we get to a number divisible by 10
- Respond to the terminating party and stop the server upon obtaining this number.
Let's see how that works for a single process:
iex -S mix Erlang/OTP 23 [erts-11.0.4] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [hipe] iex(1)> {:ok, pid} = OurNewApp.Counter.start_link(10000) {:ok, #PID<0.167.0>} iex(2)> OurNewApp.Counter.stop_gracefully(pid) :ok iex(3)> 20:03:13.061 [info] terminating with :normal, counter is 10120
Everything works fine. But what stops all counters gracefully? As we see in the OTP docs,
OurNewApp.Application.prep_stop
is called (if it exists) before the application stops.
Let's add the desired functionality:
@impl true def prep_stop(st) do stop_tasks = for {_, pid, _, _} <- Supervisor.which_children(OurNewApp.CounterSup) do Task.async(fn -> :ok = OurNewApp.Counter.stop_gracefully(pid) end) end Task.await_many(stop_tasks) st end
We also set :restart
option to :transient
in OurNewApp.CounterSup
so that our counters do not restart after graceful shutdown:
Supervisor.child_spec({OurNewApp.Counter, {start_number, 200}}, id: start_number, restart: :transient )
Try to stop the app:
iex -S mix iex(1)> Application.stop(:our_new_app) 20:24:02.958 [info] terminating with :normal, counter is 10260 20:24:02.958 [info] terminating with :normal, counter is 20260 20:24:02.962 [info] Application our_new_app exited: :stopped :ok
Now we only stop at numbers divisible by 10.
Using a special supervisor for our counters makes it possible to "find" all the instances quickly and operate with them. This is extremely important when applications go through start/stop stages.
Wrap-up
I hope you've managed to wrap your head around supervisors in Elixir, and have found this article helpful, alongside the previous article on hot code reloading.
There is another, even more complicated stage in an application life cycle: application code upgrades. We'll leave that for the third and final part of this series.
Until then, enjoy coding!
P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!