elixir

A Guide to Hot Code Reloading in Elixir

Ilya Averyanov

Ilya Averyanov on

A Guide to Hot Code Reloading in Elixir

When building software, Elixir (or Erlang) offers great benefits, including concurrency, scalability and reliability.

In this series, we will examine how to make the most of these benefits in your production code upgrades. This article will focus on hot code reloading and upgrades. But before we dive in, let's quickly define OTP.

What Is OTP in Elixir?

Formally, Erlang/OTP is a specific implementation of Erlang Runtime System, i.e. a set of libraries, compilers and a VM implementation.

Informally, OTP often denotes a set of principles to build robust apps in Erlang and the corresponding set of built-in libraries.

Hot Code Reload: Tackling the Uncertainties

There is a bit of uncertainty about this concept.

When we speak of hot code reload or hot code upgrade, we usually mean an ability to change a running process behavior without any negative impact on that process. For example, we may change the behavior of a process that holds a TCP connection without terminating this connection.

Uncertainty comes in with scaling — the question is if we can upgrade:

  • a module
  • a package (an application in terms of OTP)
  • a whole running instance (a release in terms of OTP)

OTP offers tools for upgrading at any scale. In this article, we will consider application level upgrades.

As we will see, hot code upgrades on a larger scale (application and release levels) work only for systems built according to OTP principles.

Hot Code Upgrades: The Basics

A good starting point to understand hot code reload is Hot Code Reloading in Elixir.

It explains the following key points:

  • How to reload code for a single module
  • How two versions of code exist after loading a new version of the module: new code and old code
  • The importance of external calls, which make it possible to transition from old code to new code
  • How GenServer helps us make such a transition seamlessly

At this point, I'd like to highlight one important concept in-depth: code purge.

Should You Code Purge in Elixir?

What happens if we want to upgrade code two or more times?

Let's create a small mix project:

shell
mix new code_purge cd code_purge

Then update lib/code_purge.ex to the following:

elixir
# lib/code_purge.ex defmodule CodePurge do def pi do 3.14 end end

Now we launch iex shell with mix:

shell
iex -S mix iex(1)> CodePurge.pi 3.14

Then update lib/code_purge.ex to:

elixir
# lib/code_purge.ex defmodule CodePurge do def pi do 3.142 end end

And recompile the project in a separate shell:

shell
mix compile

In our iex shell, we reload the module code:

shell
iex(2)> :code.load_file(CodePurge) {:module, CodePurge} iex(3)> CodePurge.pi 3.142

All has worked as expected. :code.load_file/1 found the updated Elixir.CodePurge.beam in _build/dev/lib/code_purge/ebin folder (as mix sets up code paths for us) and reloaded it.

But what happens if we try to reload this module once more, without actually changing it?:

shell
iex(4)> :code.load_file(CodePurge) {:error, :not_purged}

What Went Wrong Here?

Wow, that doesn't work. This is because Erlang can't have two versions of old code.

To overcome this, there are two other methods of :code: :code.purge/1 and :code.soft_purge/1.

A purge evicts the old code:

shell
iex(5)> :code.purge(CodePurge) false iex(6)> :code.load_file(CodePurge) {:module, CodePurge}

We can upgrade the code of the module again after the purge. But why do we even need to control that? Why not purge code automatically?

Well, there may still be processes running old code, and we should decide what to do with them during the upgrade. This is also why there are two functions:

  • :code.purge/1 — kills processes running old code
  • :code.soft_purge/1 — fails if there are any processes running old code

This leads to important consequences: if we want to upgrade our code more than once, our processes will be killed by default during upgrades.

Let's illustrate this.

How Not to Do a Code Upgrade

First, add file lib/code_purge/pi.ex to your toy project with the following content:

elixir
# lib/code_purge/pi.ex defmodule CodePurge.Pi do def start_link do spawn_link(&server/0) end def server do receive do {:get, from} -> send(from, {:ok, 3.14}) CodePurge.Pi.server() end end def get(pid) do send(pid, {:get, self()}) receive do {:ok, value} -> {:ok, value} after 1000 -> :error end end end

Then, run iex shell, spawn a server and check everything is fine:

shell
iex(1)> pid = CodePurge.Pi.start_link() #PID<0.140.0> iex(2)> CodePurge.Pi.get(pid) {:ok, 3.14}

Now, reload the module once (without any actual changes to functions) and try to purge it so that you can do the next 'upgrade':

shell
iex(3)> :code.load_file(CodePurge.Pi) {:module, CodePurge.Pi} iex(4)> :code.purge(CodePurge.Pi) ** (EXIT from #PID<0.152.0>) shell process exited with reason: killed

What Happened Here?

As expected, your server just died, and even an external call to CodePurge.Pi.server/0 couldn't save you. The server didn't receive messages and so didn't transition to the new code after the first upgrade.

This isn't robust. One of the obvious reasons for the failure is that we didn't use OTP libraries (GenServer and related libraries) dedicated to creating this kind of server.

Avoid Spawn in Real-World Software Development

In many books and articles, we see code examples demonstrating the power of Elixir or Erlang: tons of processes easily spawned directly with spawn or spawn_link.

However, in real-world software development, we generally should avoid creating home-brewed servers or other long-running processes, and should instead use OTP libraries.

Even for 'one-off' asynchronous tasks, we shouldn't directly use spawn or spawn_link.

Elixir has a great alternative for spawn, though: Task module (covered in depth in the AppSignal article Demystifying processes in Elixir).

How To Do a Code Upgrade Using GenServer

Let's create a better version of our server in lib/code_purge/pi_gs.ex:

elixir
# lib/code_purge/pi_gs.ex defmodule CodePurge.PiGs do use GenServer def start_link(value \\ 3.14) do GenServer.start_link(__MODULE__, value) end def init(value) do {:ok, value} end def handle_call(:get, _from, value) do {:reply, value, value} end def get(pid) do GenServer.call(pid, :get) end end

And now, try to upgrade/purge the code of a running process several times:

shell
iex(1)> {:ok, pid} = CodePurge.PiGs.start_link() {:ok, #PID<0.161.0>} iex(2)> CodePurge.PiGs.get(pid) 3.14 iex(3)> :code.load_file(CodePurge.PiGs) {:module, CodePurge.PiGs} iex(4)> :code.purge(CodePurge.PiGs) false iex(5)> :code.load_file(CodePurge.PiGs) {:module, CodePurge.PiGs} iex(6)> :code.purge(CodePurge.PiGs) false iex(7)> CodePurge.PiGs.get(pid) 3.14

Nothing bad happens! The reason why is easy to understand.

Our pid process doesn't spin in CodePurge.PiGs code. It runs a GenServer loop, and we don't update the GenServer module code at all.

CodePurge.PiGs is a callback module, and the name is kept in a GenServer internal state. GenServer makes external calls to CodePurge.PiGs functions when serving GenServer requests.

The main challenge is to keep updating the states of GenServer processes, so that any new code can work.

For a single GenServer, this can be done through :sys module and code_change callback of GenServer. This is covered in depth in the previously mentioned hot code reloading article, here, we'll only briefly demonstrate it.

Without closing the previous iex session, let's update lib/code_purge/pi_gs.ex to the following and compile:

elixir
# lib/code_purge/pi_gs.ex defmodule CodePurge.PiGs do use GenServer def start_link(value \\ 3.14) do GenServer.start_link(__MODULE__, value) end def init(value) do {:ok, [value]} end def handle_call(:get, _from, st) do [value] = st {:reply, value, st} end def get(pid) do GenServer.call(pid, :get) end def code_change(_old_vsn, value, _extra) do {:ok, [value]} end end

In code_change we updated the state, just wrapping it with a list. We also updated handle_call and init callbacks. Now, in the existing iex session, run:

shell
iex(8)> :code.purge(CodePurge.PiGs) false iex(9)> :sys.suspend(pid) :ok iex(10)> :code.load_file(CodePurge.PiGs) {:module, CodePurge.PiGs} iex(11)> :sys.change_code(pid, CodePurge.PiGs, nil, []) :ok iex(12)> :sys.resume(pid) :ok iex(13)> CodePurge.PiGs.get(pid) 3.14 iex(14)> :sys.get_state(pid) [3.14]

Everything works fine, and the last call to :sys.get_state demonstrates that the state has actually changed.

Wrap-up

In the first part of this series, we've seen that a GenServer implementation is needed for effective hot code upgrades. We've also demonstrated how to upgrade a single GenServer instance consistently.

Upgrading an individual process, together with its callback module, can be used as a 'tactical weapon' to fix localized bugs or add some logging.

But updating a system at a greater scale, on a regular basis, requires more powerful tools. In the next part of the series, I'll delve into the world of supervisors in Elixir.

I hope you found this run-through of hot code reloading useful. See you next time for supervisors!

P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

Ilya Averyanov

Ilya Averyanov

Our guest author Ilya is an Elixir/Erlang/Python developer and a tech leader at [FunBox](https://funbox.ru/). His main occupation is bootstrapping new projects from both human and technological perspectives. Feel free to reach out to him for interesting discussions or consultancy.

All articles by Ilya Averyanov

Become our next author!

Find out more

AppSignal monitors your apps

AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

Discover AppSignal
AppSignal monitors your apps