A Guide to Hot Code Reloading in Elixir

When building software, Elixir (or Erlang) offers great benefits, including concurrency, scalability and reliability.

In this series, we will examine how to make the most of these benefits in your production code upgrades. This article will focus on hot code reloading and upgrades. But before we dive in, let's quickly define OTP.

What Is OTP in Elixir?

Formally, Erlang/OTP is a specific implementation of Erlang Runtime System, i.e. a set of libraries, compilers and a VM implementation.

Informally, OTP often denotes a set of principles to build robust apps in Erlang and the corresponding set of built-in libraries.

Hot Code Reload: Tackling the Uncertainties

There is a bit of uncertainty about this concept.

When we speak of hot code reload or hot code upgrade, we usually mean an ability to change a running process behavior without any negative impact on that process. For example, we may change the behavior of a process that holds a TCP connection without terminating this connection.

Uncertainty comes in with scaling — the question is if we can upgrade:

a module
a package (an application in terms of OTP)
a whole running instance (a release in terms of OTP)

OTP offers tools for upgrading at any scale. In this article, we will consider application level upgrades.

As we will see, hot code upgrades on a larger scale (application and release levels) work only for systems built according to OTP principles.

Hot Code Upgrades: The Basics

A good starting point to understand hot code reload is Hot Code Reloading in Elixir.

It explains the following key points:

How to reload code for a single module
How two versions of code exist after loading a new version of the module: new code and old code
The importance of external calls, which make it possible to transition from old code to new code
How GenServer helps us make such a transition seamlessly

At this point, I'd like to highlight one important concept in-depth: code purge.

Should You Code Purge in Elixir?

What happens if we want to upgrade code two or more times?

Let's create a small mix project:

Shell

mix new code_purge
cd code_purge

Then update lib/code_purge.ex to the following:

Elixir

# lib/code_purge.ex
defmodule CodePurge do
  def pi do
    3.14
  end
end

Now we launch iex shell with mix:

Shell

iex -S mix
iex(1)> CodePurge.pi
3.14

Then update lib/code_purge.ex to:

Elixir

# lib/code_purge.ex
defmodule CodePurge do
  def pi do
    3.142
  end
end

And recompile the project in a separate shell:

Shell

mix compile

In our iex shell, we reload the module code:

Shell

iex(2)> :code.load_file(CodePurge)
{:module, CodePurge}
iex(3)> CodePurge.pi
3.142

All has worked as expected. :code.load_file/1 found the updated Elixir.CodePurge.beam in _build/dev/lib/code_purge/ebin folder (as mix sets up code paths for us) and reloaded it.

But what happens if we try to reload this module once more, without actually changing it?:

Shell

iex(4)> :code.load_file(CodePurge)
{:error, :not_purged}

What Went Wrong Here?

Wow, that doesn't work. This is because Erlang can't have two versions of old code.

To overcome this, there are two other methods of :code: :code.purge/1 and :code.soft_purge/1.

A purge evicts the old code:

Shell

iex(5)> :code.purge(CodePurge)
false
iex(6)> :code.load_file(CodePurge)
{:module, CodePurge}

We can upgrade the code of the module again after the purge. But why do we even need to control that? Why not purge code automatically?

Well, there may still be processes running old code, and we should decide what to do with them during the upgrade. This is also why there are two functions:

:code.purge/1 — kills processes running old code
:code.soft_purge/1 — fails if there are any processes running old code

This leads to important consequences: if we want to upgrade our code more than once, our processes will be killed by default during upgrades.

Let's illustrate this.

How Not to Do a Code Upgrade

First, add file lib/code_purge/pi.ex to your toy project with the following content:

Elixir

# lib/code_purge/pi.ex
defmodule CodePurge.Pi do
  def start_link do
    spawn_link(&server/0)
  end
 
  def server do
    receive do
      {:get, from} ->
        send(from, {:ok, 3.14})
        CodePurge.Pi.server()
    end
  end
 
  def get(pid) do
    send(pid, {:get, self()})
 
    receive do
      {:ok, value} ->
        {:ok, value}
    after
      1000 ->
        :error
    end
  end
end

Then, run iex shell, spawn a server and check everything is fine:

Shell

iex(1)> pid = CodePurge.Pi.start_link()
#PID<0.140.0>
iex(2)> CodePurge.Pi.get(pid)
{:ok, 3.14}

Now, reload the module once (without any actual changes to functions) and try to purge it so that you can do the next 'upgrade':

Shell

iex(3)> :code.load_file(CodePurge.Pi)
{:module, CodePurge.Pi}
iex(4)> :code.purge(CodePurge.Pi)
** (EXIT from #PID<0.152.0>) shell process exited with reason: killed

What Happened Here?

As expected, your server just died, and even an external call to CodePurge.Pi.server/0 couldn't save you. The server didn't receive messages and so didn't transition to the new code after the first upgrade.

This isn't robust. One of the obvious reasons for the failure is that we didn't use OTP libraries (GenServer and related libraries) dedicated to creating this kind of server.

Avoid Spawn in Real-World Software Development

In many books and articles, we see code examples demonstrating the power of Elixir or Erlang: tons of processes easily spawned directly with spawn or spawn_link.

However, in real-world software development, we generally should avoid creating home-brewed servers or other long-running processes, and should instead use OTP libraries.

Even for 'one-off' asynchronous tasks, we shouldn't directly use spawn or spawn_link.

Elixir has a great alternative for spawn, though: Task module (covered in depth in the AppSignal article Demystifying processes in Elixir).

How To Do a Code Upgrade Using GenServer

Let's create a better version of our server in lib/code_purge/pi_gs.ex:

Elixir

# lib/code_purge/pi_gs.ex
defmodule CodePurge.PiGs do
  use GenServer
 
  def start_link(value \\ 3.14) do
    GenServer.start_link(__MODULE__, value)
  end
 
  def init(value) do
    {:ok, value}
  end
 
  def handle_call(:get, _from, value) do
    {:reply, value, value}
  end
 
  def get(pid) do
    GenServer.call(pid, :get)
  end
end

And now, try to upgrade/purge the code of a running process several times:

Shell

iex(1)> {:ok, pid} = CodePurge.PiGs.start_link()
{:ok, #PID<0.161.0>}
iex(2)> CodePurge.PiGs.get(pid)
3.14
iex(3)> :code.load_file(CodePurge.PiGs)
{:module, CodePurge.PiGs}
iex(4)> :code.purge(CodePurge.PiGs)
false
iex(5)> :code.load_file(CodePurge.PiGs)
{:module, CodePurge.PiGs}
iex(6)> :code.purge(CodePurge.PiGs)
false
iex(7)> CodePurge.PiGs.get(pid)
3.14

Nothing bad happens! The reason why is easy to understand.

Our pid process doesn't spin in CodePurge.PiGs code. It runs a GenServer loop, and we don't update the GenServer module code at all.

CodePurge.PiGs is a callback module, and the name is kept in a GenServer internal state. GenServer makes external calls to CodePurge.PiGs functions when serving GenServer requests.

The main challenge is to keep updating the states of GenServer processes, so that any new code can work.

For a single GenServer, this can be done through :sys module and code_change callback of GenServer. This is covered in depth in the previously mentioned hot code reloading article, here, we'll only briefly demonstrate it.

Without closing the previous iex session, let's update lib/code_purge/pi_gs.ex to the following and compile:

Elixir

# lib/code_purge/pi_gs.ex
defmodule CodePurge.PiGs do
  use GenServer
 
  def start_link(value \\ 3.14) do
    GenServer.start_link(__MODULE__, value)
  end
 
  def init(value) do
    {:ok, [value]}
  end
 
  def handle_call(:get, _from, st) do
    [value] = st
    {:reply, value, st}
  end
 
  def get(pid) do
    GenServer.call(pid, :get)
  end
 
  def code_change(_old_vsn, value, _extra) do
    {:ok, [value]}
  end
end

In code_change we updated the state, just wrapping it with a list. We also updated handle_call and init callbacks. Now, in the existing iex session, run:

Shell

iex(8)> :code.purge(CodePurge.PiGs)
false
iex(9)> :sys.suspend(pid)
:ok
iex(10)> :code.load_file(CodePurge.PiGs)
{:module, CodePurge.PiGs}
iex(11)> :sys.change_code(pid, CodePurge.PiGs, nil, [])
:ok
iex(12)> :sys.resume(pid)
:ok
iex(13)> CodePurge.PiGs.get(pid)
3.14
iex(14)> :sys.get_state(pid)
[3.14]

Everything works fine, and the last call to :sys.get_state demonstrates that the state has actually changed.

Wrap-up

In the first part of this series, we've seen that a GenServer implementation is needed for effective hot code upgrades. We've also demonstrated how to upgrade a single GenServer instance consistently.

Upgrading an individual process, together with its callback module, can be used as a 'tactical weapon' to fix localized bugs or add some logging.

But updating a system at a greater scale, on a regular basis, requires more powerful tools. In the next part of the series, I'll delve into the world of supervisors in Elixir.

I hope you found this run-through of hot code reloading useful. See you next time for supervisors!

P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

Core features

Advanced tools

Supported Languages

Larger scale

Add-Ons

A Guide to Hot Code Reloading in Elixir

This post is part of Production Code Upgrades In Elixir Series

What Is OTP in Elixir?

Hot Code Reload: Tackling the Uncertainties

Hot Code Upgrades: The Basics

Should You Code Purge in Elixir?

What Went Wrong Here?

How Not to Do a Code Upgrade

What Happened Here?

Avoid Spawn in Real-World Software Development

How To Do a Code Upgrade Using GenServer

Wrap-up

This post is part of Production Code Upgrades In Elixir Series

Wondering what you can do next?

Most popular Elixir articles

A Complete Guide to Phoenix for Elixir Monitoring with AppSignal

Enhancing Your Elixir Codebase with Gleam

Using Dependency Injection in Elixir

Ilya Averyanov

AppSignal monitors your apps

Core features

Advanced tools

Supported Languages

Larger scale

Add-Ons

This post is part of Production Code Upgrades In Elixir Series

What Is OTP in Elixir?

Hot Code Reload: Tackling the Uncertainties

How Are OTP and Hot Code Upgrades Related?

Hot Code Upgrades: The Basics

Should You Code Purge in Elixir?

What Went Wrong Here?

How Not to Do a Code Upgrade

What Happened Here?

Avoid Spawn in Real-World Software Development

How To Do a Code Upgrade Using GenServer

Wrap-up

This post is part of Production Code Upgrades In Elixir Series

Wondering what you can do next?

Most popular Elixir articles

A Complete Guide to Phoenix for Elixir Monitoring with AppSignal

Enhancing Your Elixir Codebase with Gleam

Using Dependency Injection in Elixir

Ilya Averyanov

AppSignal monitors your apps