A Guide to Hot Code Reloading in Elixir

Ilya Ilya Averyanov on

When building software, Elixir (or Erlang) offers great benefits, including concurrency, scalability and reliability.

In this series, we will examine how to make the most of these benefits in your production code upgrades. This article will focus on hot code reloading and upgrades. But before we dive in, let’s quickly define OTP.

What Is OTP in Elixir?

Formally, Erlang/OTP is a specific implementation of Erlang Runtime System, i.e. a set of libraries, compilers and a VM implementation.

Informally, OTP often denotes a set of principles to build robust apps in Erlang and the corresponding set of built-in libraries.

Hot Code Reload: Tackling the Uncertainties

There is a bit of uncertainty about this concept.

When we speak of hot code reload or hot code upgrade, we usually mean an ability to change a running process behavior without any negative impact on that process. For example, we may change the behavior of a process that holds a TCP connection without terminating this connection.

Uncertainty comes in with scaling — the question is if we can upgrade:

OTP offers tools for upgrading at any scale. In this article, we will consider application level upgrades.

How Are OTP and Hot Code Upgrades Related?

As we will see, hot code upgrades on a larger scale (application and release levels) work only for systems built according to OTP principles.

Hot Code Upgrades: The Basics

A good starting point to understand hot code reload is Hot Code Reloading in Elixir.

It explains the following key points:

At this point, I’d like to highlight one important concept in-depth: code purge.

Should You Code Purge in Elixir?

What happens if we want to upgrade code two or more times?

Let’s create a small mix project:

1
2
mix new code_purge
cd code_purge

Then update lib/code_purge.ex to the following:

1
2
3
4
5
6
# lib/code_purge.ex
defmodule CodePurge do
  def pi do
    3.14
  end
end

Now we launch iex shell with mix:

1
2
3
iex -S mix
iex(1)> CodePurge.pi
3.14

Then update lib/code_purge.ex to:

1
2
3
4
5
6
# lib/code_purge.ex
defmodule CodePurge do
  def pi do
    3.142
  end
end

And recompile the project in a separate shell:

1
mix compile

In our iex shell, we reload the module code:

1
2
3
4
iex(2)> :code.load_file(CodePurge)
{:module, CodePurge}
iex(3)> CodePurge.pi
3.142

All has worked as expected. :code.load_file/1 found the updated Elixir.CodePurge.beam in _build/dev/lib/code_purge/ebin folder (as mix sets up code paths for us) and reloaded it.

But what happens if we try to reload this module once more, without actually changing it?:

1
2
iex(4)> :code.load_file(CodePurge)
{:error, :not_purged}

What Went Wrong Here?

Wow, that doesn’t work. This is because Erlang can’t have two versions of old code.

To overcome this, there are two other methods of :code: :code.purge/1 and :code.soft_purge/1.

A purge evicts the old code:

1
2
3
4
iex(5)> :code.purge(CodePurge)
false
iex(6)> :code.load_file(CodePurge)
{:module, CodePurge}

We can upgrade the code of the module again after the purge. But why do we even need to control that? Why not purge code automatically?

Well, there may still be processes running old code, and we should decide what to do with them during the upgrade. This is also why there are two functions: - :code.purge/1 — kills processes running old code - :code.soft_purge/1 — fails if there are any processes running old code

This leads to important consequences: if we want to upgrade our code more than once, our processes will be killed by default during upgrades.

Let’s illustrate this.

How Not to Do a Code Upgrade

First, add file lib/code_purge/pi.ex to your toy project with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# lib/code_purge/pi.ex
defmodule CodePurge.Pi do
  def start_link do
    spawn_link(&server/0)
  end

  def server do
    receive do
      {:get, from} ->
        send(from, {:ok, 3.14})
        CodePurge.Pi.server()
    end
  end

  def get(pid) do
    send(pid, {:get, self()})

    receive do
      {:ok, value} ->
        {:ok, value}
    after
      1000 ->
        :error
    end
  end
end

Then, run iex shell, spawn a server and check everything is fine:

1
2
3
4
iex(1)> pid = CodePurge.Pi.start_link()
#PID<0.140.0>
iex(2)> CodePurge.Pi.get(pid)
{:ok, 3.14}

Now, reload the module once (without any actual changes to functions) and try to purge it so that you can do the next ‘upgrade’:

1
2
3
4
iex(3)> :code.load_file(CodePurge.Pi)
{:module, CodePurge.Pi}
iex(4)> :code.purge(CodePurge.Pi)
** (EXIT from #PID<0.152.0>) shell process exited with reason: killed

What Happened Here?

As expected, your server just died, and even an external call to CodePurge.Pi.server/0 couldn’t save you. The server didn’t receive messages and so didn’t transition to the new code after the first upgrade.

This isn’t robust. One of the obvious reasons for the failure is that we didn’t use OTP libraries (GenServer and related libraries) dedicated to creating this kind of server.

Avoid Spawn in Real-World Software Development

In many books and articles, we see code examples demonstrating the power of Elixir or Erlang: tons of processes easily spawned directly with spawn or spawn_link.

However, in real-world software development, we generally should avoid creating home-brewed servers or other long-running processes, and should instead use OTP libraries.

Even for 'one-off’ asynchronous tasks, we shouldn’t directly use spawn or spawn_link.

Elixir has a great alternative for spawn, though: Task module (covered in depth in the AppSignal article Demystifying processes in Elixir).

How To Do a Code Upgrade Using GenServer

Let’s create a better version of our server in lib/code_purge/pi_gs.ex:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# lib/code_purge/pi_gs.ex
defmodule CodePurge.PiGs do
  use GenServer

  def start_link(value \\ 3.14) do
    GenServer.start_link(__MODULE__, value)
  end

  def init(value) do
    {:ok, value}
  end

  def handle_call(:get, _from, value) do
    {:reply, value, value}
  end

  def get(pid) do
    GenServer.call(pid, :get)
  end
end

And now, try to upgrade/purge the code of a running process several times:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
iex(1)> {:ok, pid} = CodePurge.PiGs.start_link()
{:ok, #PID<0.161.0>}
iex(2)> CodePurge.PiGs.get(pid)
3.14
iex(3)> :code.load_file(CodePurge.PiGs)
{:module, CodePurge.PiGs}
iex(4)> :code.purge(CodePurge.PiGs)
false
iex(5)> :code.load_file(CodePurge.PiGs)
{:module, CodePurge.PiGs}
iex(6)> :code.purge(CodePurge.PiGs)
false
iex(7)> CodePurge.PiGs.get(pid)
3.14

Nothing bad happens! The reason why is easy to understand.

Our pid process doesn’t spin in CodePurge.PiGs code. It runs a GenServer loop, and we don’t update the GenServer module code at all.

CodePurge.PiGs is a callback module, and the name is kept in a GenServer internal state. GenServer makes external calls to CodePurge.PiGs functions when serving GenServer requests.

The main challenge is to keep updating the states of GenServer processes, so that any new code can work.

For a single GenServer, this can be done through :sys module and code_change callback of GenServer. This is covered in depth in the previously mentioned hot code reloading article, here, we’ll only briefly demonstrate it.

Without closing the previous iex session, let’s update lib/code_purge/pi_gs.ex to the following and compile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# lib/code_purge/pi_gs.ex
defmodule CodePurge.PiGs do
  use GenServer

  def start_link(value \\ 3.14) do
    GenServer.start_link(__MODULE__, value)
  end

  def init(value) do
    {:ok, [value]}
  end

  def handle_call(:get, _from, st) do
    [value] = st
    {:reply, value, st}
  end

  def get(pid) do
    GenServer.call(pid, :get)
  end

  def code_change(_old_vsn, value, _extra) do
    {:ok, [value]}
  end
end

In code_change we updated the state, just wrapping it with a list. We also updated handle_call and init callbacks. Now, in the existing iex session, run:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
iex(8)> :code.purge(CodePurge.PiGs)
false
iex(9)> :sys.suspend(pid)
:ok
iex(10)> :code.load_file(CodePurge.PiGs)
{:module, CodePurge.PiGs}
iex(11)> :sys.change_code(pid, CodePurge.PiGs, nil, [])
:ok
iex(12)> :sys.resume(pid)
:ok
iex(13)> CodePurge.PiGs.get(pid)
3.14
iex(14)> :sys.get_state(pid)
[3.14]

Everything works fine, and the last call to :sys.get_state demonstrates that the state has actually changed.

Wrap-up

In the first part of this series, we’ve seen that a GenServer implementation is needed for effective hot code upgrades. We’ve also demonstrated how to upgrade a single GenServer instance consistently.

Upgrading an individual process, together with its callback module, can be used as a 'tactical weapon’ to fix localized bugs or add some logging.

But updating a system at a greater scale, on a regular basis, requires more powerful tools. In the next part of the series, I’ll delve into the world of supervisors in Elixir.

I hope you found this run-through of hot code reloading useful. See you next time for supervisors!

P.S. If you’d like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

Our guest author Ilya is an Elixir/Erlang/Python developer and a tech leader at FunBox. His main occupation is bootstrapping new projects from both human and technological perspectives. Reach out via his Twitter for interesting discussions or consultancy.

5 favorite Elixir articles

10 latest Elixir articles

Go back
Elixir alchemy icon

Subscribe to

Elixir Alchemy

A true alchemist is never done exploring. And neither are we. Sign up for our Elixir Alchemy email series and receive deep insights about Elixir, Phoenix and other developments.

We'd like to set cookies, read why.