Demystifying processes in Elixir

Welcome to the first edition of Elixir Alchemy, an e-mail series about Elixir by your friends at AppSignal. In this edition, we'll learn how processes work in Elixir by deconstructing the Task module. Along the way, we'll learn what processes are, how they communicate, and how crashes are handled. Let's dive right in!

The Task module

One of the abstractions Elixir provides around processes is the Task module. It's used to spawn a process that executes a single action, without communicating with other processes. It's frequently used to turn a sequential program into a concurrent one by running multiple functions asynchronously.

Elixir

defmodule API do
  def random do
    time = :rand.uniform(1000)
    :timer.sleep(time)
    time
  end
end
 
IO.inspect [API.random, API.random, API.random]

In the example above, we have an API that returns random numbers under 1000. Its only deficiency is that the number that's returned is also the number of milliseconds the call will take. For example, if the random number this API returns is 517, the function will sleep for half a second before returning the results. Unlucky as we are, we need three random numbers, so we'll have to call the API three times. With each process potentially taking a second to return, this piece of code can take more than three seconds to run.

We can make this code asynchronous by starting three Tasks, which will start three processes that will run their own API call concurrently. After they've all been started, we await their results and print them out all at once.

Elixir

[
  Task.async(&API.random/0),
  Task.async(&API.random/0),
  Task.async(&API.random/0)
]
|> Enum.map(&Task.await/1)
|> IO.inspect

This way, the code will only take as long as the slowest call, instead of all of them combined.

Although this is quite a simple example, there's a lot going on that's abstracted away by the Task module. What's going on under the hood? By knowing what happens after we call Task.async/1, we can get a better understanding of processes in Elixir, and it'll make code that uses the Task module easier to debug when something goes wrong. To do that, we'll have to go a little deeper to learn more about how processes work in Elixir, and therefore in Erlang.

Processes in Erlang

Erlang's processes are managed in the VM instead of by the operating system, allowing them to be lightweight and really fast to spawn. Due to the lack of process pools and the low memory footprint, Erlang programs can spawn as many processes as needed. To handle all of these processes, Erlang starts one scheduler per core which schedules time for each of the processes in its run queue. Erlang will even take care of load balancing between your schedulers.

To prevent transient bugs and the need for locks, Erlang's processes don't share memory. They can only communicate with each other through message passing.

To spawn a new process from the current one, Elixir provides the spawn/1 function, which takes a function as its only argument. The function we pass will run in a newly spawned process, after which it disappears again.

Elixir

spawn(&API.random/0) # => #PID<0.82.0>

Because the function will run asynchronously in another process, we don't immediately get the result back. Instead, we receive a process identifier, or "PID". We can use that PID as an address to send messages to our new process.

Messaging

Elixir's processes communicate with each other by sending messages to each other's mailboxes. A process can send a message to any other process, as long as it has the PID of that process as an address to send the message to.

To retrieve the result from a spawned process in the parent process, the spawned process needs to send it to another process. To send a message to a process, Elixir provides the send/2 function.

Elixir

send(self(), :hello)
 
IO.puts receive do
  :hello -> "Hello to you too!"
  unknown -> ~s(Received unknown message: "#{unknown}")
end

In the example above, we're sending the message :hello to the current process by using self/0 to get the current process's PID. For a process to retrieve a message from its mailbox, Elixir provides receive/1, which matches messages from the mailbox to the given patterns. If one matches, the message is removed from the mailbox and the pattern's block is executed. If none match, the message is placed back into the mailbox to be reviewed at a later time.

To get the resulting value of a function run in another process, we need to send a message from the spawned process to the process that spawned it. A process doesn't know which process started it, because processes are completely isolated from each other. If we want to send a message back to the "parent" process, we'll have to pass its PID when spawning a process.

Using the "parent" PID, the function that's spawned in the new process can send its results back to the parent process using the send/1 function.

Elixir

parent = self()
 
spawn(fn() ->
  send(parent, API.random())
end)
 
receive do
  random -> IO.puts "Received #{random}"
end

The parent process is then tasked with receiving the message and acting on it. It uses the receive/1 function to retrieve a message from its mailbox. This blocks the parent process, as it's waiting for the result of the function we spawned.

If a message comes in, the receive/1 function picks it up, and we're able to pattern match on the message's value. In this example, we'll simply print it out after removing it from the mailbox.

Crashes and timeouts

When spawning a new process, one thing to keep in mind is that it might crash before it had the chance to send its message back to the parent process. In such cases, the example above would hang on the receive/1 block indefinitely, causing a deadlock.

A solution is to add a timeout to your receive blocks using the after keyword. That way, the after block is run after a specified amount of time has passed.

Elixir

receive do
  random -> IO.puts "Received #{random}"
after
  500 -> IO.puts "No response received"
end

Although the after block here will do its job, it's most likely not the best fallback to use for processes dying.

Most importantly, the spawned process might just be slow instead of dead. Depending on your application, it could be better to just wait until we get a result instead of giving up after half a second. For our example application, this won't work because we know each call can take up to a second to complete.

Also, if the spawned process dies, it will take half a second for the parent process to notice. Instead, it would be nice for the parent process to know immediately when one of its spawned processes dies.

Error handling with linked processes

A link is a relationship between two processes. When a process crashes, it takes down all processes linked to it. That might sound like a bad thing, but it actually makes a lot of sense. By linking two processes together, we state that they're dependent on each other.

Process supervision and letting it crash

A supervisor is a process that keeps its child processes alive by restarting them when needed. Its children can be "regular" processes, or other supervisors. By nesting supervisors, a supervision tree can be built that allows parts of an application to crash and be restarted without affecting the rest of the system.

"Regular" processes are usually implemented to crash when a situation occurs they can't recover from, because their supervisors are tasked with restarting them when needed. That way, only a small part of the application restarts while the rest continues as normal. With supervised processes, a lot of defensive programming can be avoided by only implementing the working path through our applications and leaving the rest to crash.

If we wouldn't state this dependency, each of our processes needs to be able to function without the other processes being available. Instead of trying to cover that with a lot of defensive programming, we can use linking to have the processes crash as soon as possible when one of their dependencies dies.

Instead of using the spawn/1 function, which creates a "normal" process, we'll use spawn_link/1 to create one linked to the parent process. Now, when the spawned process crashes, it'll take the parent process down as well instead of hanging, waiting for a message to come in.

Elixir

parent = self()
 
spawn_link(fn() ->
  raise("The API is down!")
end)
 
receive do
  random -> IO.puts "Received #{random}"
end

In the example above, the main process will crash before reaching the receive/1 block, as it's linked to the process spawned with spawn_link/1:

Shell

09:16:35.682 [error] Process #PID<0.82.0> raised an exception
** (RuntimeError) The API is down!
    :erlang.apply/2

The majority of processes in Elixir are spawned as linked processes. The exception to that is spawning a process where the "parent" process isn't interested in its result, and whether or not it succeeded.

How Tasks run functions asynchronously

When programming Elixir, you'll rarely need to spawn new processes using the spawn/1 and spawn_link/1 functions. Like we briefly touched on before, Elixir provides the Task module that abstracts away the messaging and error handling you'd have to do to run a function asynchronously.

Elixir

[
  Task.async(&API.random/0),
  Task.async(&API.random/0),
  Task.async(&API.random/0)
]
|> Enum.map(&Task.await/1)
|> IO.inspect

Let's look at the API example again. We call Task.async/1 three times, which starts three processes to run API.random/0, and returns a Task struct for each of them. Each of these structs includes the PID for the started process. That's a lot like we've seen before when calling spawn_link/1 ourselves.

Each call to API.random/0 can take up to a second, but since they've been started in three separate processes, the time it takes to fetch three random numbers is as long as it takes to get the slowest one.

In the new process, API.random/0 gets executed and its result is sent to the parent process using a call to send/1. In the parent process, Task.await/1 is called for each of the Tasks to get their results. Under the hood, that's done with a receive/1 block to wait for the results of each of the spawned processes.

Finally, if Task.await/1's receive/1 block receives a message, it's returned. If it doesn't, it'll time out and send an exit signal to the parent process.

If the spawned process crashes, it'll send an exit signal to the parent process, because both processes are linked. Instead of trying to handle processes not being available, we'll let it crash and have the supervisor restart the parent process.

The basics of concurrency in Elixir

By deconstructing the Task module, we've gained some insight into processes. We've learned how to spawn a process, send messages to it, and how to handle received messages in processes. With this, we understand the basics of concurrency in Elixir (and in Erlang, for that matter).

Using these primitives, we can understand what happens when a Task is started, even though its messaging is abstracted away for our convenience.

That concludes our dive into processes in Elixir. We'd love to know how you liked this article, if you have any questions about it, and what you'd like to read about next, so be sure to let us know at @AppSignal.