In this post, we'll cover how Elixir applications can be traced using OpenTelemetry and how macros can make this process super easy and streamlined.
First, we'll talk about tracing and OpenTelemetry in Elixir. Then we'll improve our custom tracing layer step-by-step until we get an easy and seamless tool to trace our application.
Let's get started!
On Tracing in Elixir
Note: All the research done to write this article resulted in the creation of the abstracing library. It's far from complete, but it encapsulates all the ideas written here.
Think of instances when your app crashes in production. A bunch of artifacts are generated: stacktraces, logs, reports, etc.
When using proper tracing, developers can link all these artifacts in a sequence of events — from a starting point down to the response, and operations with side effects.
Here is a simple example of a POST /users
trace request:
From left to right, we can see that the request was received, decoded, validated, saved, and then, finally, a response was encoded and sent. Each little block in this trace is called a span. Spans are the building blocks of tracing, as they represent events inside an application.
Spans require this basic information: the start, the end, and the status (success or error). We can enrich each span with more data, and they can even have a direct correlation to these entities if we add the appropriate metadata to logs and errors.
Meet OpenTelemetry for Elixir
The OpenTelemetry project homepage states that it is a:
Collection of APIs, SDKs, and tools
That can be used to:
Instrument, generate, collect, and export telemetry data (metrics, logs, and traces).
For
Elixir, the
OpenTelemetry
library has everything we need to perform distributed tracing.
Here is a very simple example of how we can use it:
What's Happening Here?
As you can see, it's pretty straightforward. To start using all macros inside
OpenTelemetry.Tracer
, we first require
it at the top of our module.
When we
want to start a span, we just need to call OpenTelemetry.Tracer.with_span/2
and write our code.
Under the hood, :otel_tracer.with_span/4
is used to
actually start the span — even though the OpenTelemetry API does provide Elixir
modules to interact with, all the heavy lifting is actually written in Erlang.
One really cool thing about spans is that we can add metadata to
reported data. This gives us more contextual information to investigate issues. We
can do this by calling OpenTelemetry.Tracer.set_attribute/2
. It only accepts a
small set of types (atoms, booleans, binaries, and tuples), so we need to be
mindful when using it.
Now for a brief overview of how I ended up building an abstraction layer for OpenTelemetry.
Why I Built an Abstraction Layer for OpenTelemetry
When I first started using OpenTelemetry, I noticed that I was constantly creating small private functions to translate data and help me with the setup. As I did more and more of that, I eventually extracted all this boilerplate to its own feature. I created an abstraction layer for OpenTelemetry.
A few pain points during my use of OpenTelemetry that were solved by this abstraction layer:
- If your code throws an unexpected exception, the span will not be collected.
- Adding complex data requires transforming it first.
- Long namespaces (not really a problem, but I just don't like them 😁).
Luckily, the OpenTelemetry library also has a low-level API that can be used to customize our tracing tooling, and that's exactly what we'll be doing now!
Breaking Down the Pain Points of OpenTelemetry
Now that we know some of the direct pain points of using OpenTelemetry, let's break them down into separate categories and solve each one. We can group the features that we inject boilerplate code in to:
- Setup: how we prepare a module to be traced
- Start/stop spans: the steps required to actually create spans
- Modifying spans: adding more attributes to spans
- Exception handling: collecting errors and changing the span status
In the end, we want to cover all these features with the least amount of boilerplate code as possible.
A Setup for a Setup in Elixir
The very first thing we need to do is prepare our module to use the tracing
macros and libraries. We'll use Elixir's special macro
__using__/1
to automate some of this stuff for us:
Now, whenever we need to trace our code, we just need to call use Tracing
at
the top of our module.
So far, so good — but nothing too exciting. However, by using this simple setup macro we don't have to modify the modules using it if we ever make changes to the setup process, as all changes will be automatically replicated. That's a good start!
In the next section, we'll start to remove some more meaningful boilerplate.
Translating Elixir Application Data to Span Attributes
OpenTelemetry allows applications to include attributes in spans. An attribute
consists of a key
and a value
. This helps us to include useful information
that can later be used to either create monitoring triggers or investigate
a crash.
But here is a catch: OpenTelemetry only accepts numbers, strings, atoms,
booleans, and lists (if its elements are from any of the supported basic types).
Applications work with a richer set of data types: not only numbers and strings
but also complex lists
, maps
, structs
, and tuples
.
Of course, we can use inspect/1
on the variable values and have everything in
there. However, this makes searching for spans a much harder task, as we would need
to use complex regexes to search for them.
Convert Complex Types to Basic Types
It's possible, however, to convert the complex types to more basic (and supported) types. Let's define a few rules:
- Lists will use their indexes to name the values
- Tuples will be converted to lists
- Maps will use their keys to name values
- Structs will be converted to maps
So, a simple list like [1, 2, 3]
would be transformed into a list of pairs:
For maps, we can use their keys to generate the pairs:
Using Elixir's defguard
Since we're handling maps, lists, and tuples as a set of values, we can use
Elixir's defguard
to create a custom function guard:
Now we can start building our custom set_attribute
:
It's simple: if the value is a set, we call set_attributes
(in plural!),
otherwise, we just delegate to OpenTelemety.Tracer.set_attribute/2
.
The next piece is where all the transformation happens:
Here, we receive the set of values, convert them, and then call
OpenTelemetry.Tracer.set_attributes/1
to do the actual work of adding the
attributes to the span.
The data conversion happens in the
enumerable_to_attrs/2
function. It works by recursively going into each element of the collection and
converting it to the appropriate basic type supported by OpenTelemetry. Adding
its code here is beyond the scope of this post, but feel free to check
it out on GitHub!
Wrapping Up
In this post, we discussed the basics of tracing and began to explore how we can utilize OpenTelemetry in Elixir. We laid the foundation for an abstraction layer that will simplify the creation and manipulation of spans, making the process seamless and straightforward.
Happy coding!
P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!