This post was updated on 9 August 2023. The 'Types of Metaprogramming' and 'Why Do Metaprogramming?' sections were switched, so that 'Why Do Metaprogramming?' is now covered before 'Types of Metaprogramming'.
In this world, there are many mysteries — but few are as elusive as metaprogramming in Elixir.
In this four-part series, we'll start by looking at core concepts and then explore how metaprogramming operates in Elixir specifically.
Let's develop an understanding of metaprogramming and uncover some Elixir metaprogramming secrets!
Introducing Metaprogramming in Elixir
According to Harald Sondergaard, metaprogramming is:
a programming technique in which computer programs have the ability to treat other programs as their data; meaning that a program can be designed to read, generate, analyze, or transform other programs, and even modify itself while running.
In essence, metaprogramming — much like metadata — revolves around "a set of programs that describe and give information about other programs" (adapted from the Oxford dictionary definition of metadata).
Before we get into how to do metaprogramming in Elixir, let's understand why we are doing it in the first place.
Why Do Metaprogramming?
We will look at how to use metaprogramming specifically in Elixir. But first, let's cover some general concepts and benefits of metaprogramming to understand why we do it in the first place:
- Adapt code to runtime factors - One of the benefits of metaprogramming is generating code that can adapt to different run-time factors. This is great if you need your software to be dynamic and respond to varying factors at runtime.
- Performance optimization - It's possible to use metaprogramming to customize your code in such a way that it leads to better performance compared to trying to achieve the same with a static codebase.
- Reduce errors - Using metaprogramming technniques, you can reduce errors that are bound to occur when you copy-paste code a lot. By using macros that generate code on the fly, it's possible to reduce the need for manual copy-paste operations that would likely introduce errors and other code mistakes at compile-time.
- DSLs - With metaprogramming, a developer can easily create a domain-specific language to better express how to handle a particular scenario in their project.
This is just a small snapshot of the possibilities available to a developer through using metaprogramming (we'll also highlight a few uses later in the article). That said, one thing you'll observe here is the frequent use of the terms "run-time" and "compile-time". Let's see what they mean next.
Defining Run-time and Compile-time
We can broadly classify metaprogramming into two categories: compile-time and run-time metaprogramming.
But what exactly are run-time and compile-time? They both refer to stages of a program's life cycle.
Compile-time is the stage at which source code converts to binary code or intermediate binary code for a machine or virtual machine to execute. Run-time refers to when code executes.
The program life cycle includes the following steps:
Source: https://en.wikipedia.org/wiki/Program_lifecycle_phase
Note that this is not a complete representation of the entire program life cycle, just a simplified one.
Compilation "sets the program in stone" by converting it into binary code. Metaprogramming exposes this process to allow developers to "move computation from run-time to compile-time" or "generate code using compile-time computations". This essentially allows the modification of the source code before/during compile-time, meaning that the generated binary code is slightly different.
Self-modifying code is rather unique. In essence, it performs reflection during run-time. The life cycle of a self-modifying program looks a little different:
Note: You can substitute the "binary" for any intermediate language generated by the compiler, such as JVM bytecode or, in Elixir's case, BEAM VM bytecode.
Types of Metaprogramming
There are two types of metaprogramming that prescribe varying degrees of control over a given program:
Introspection
Introspection refers to a program revealing metadata about other programs or itself.
This definition broadly covers the first part of the definition of metaprogramming: "a program can be designed to read...[and] analyze...other programs". The program has access to information about itself or other programs.
Reflection
Reflection refers to a program modifying other programs or itself.
If a program can modify other programs or itself, it — by definition — has access to the metadata of the program, revealing information like names of functions.
By looking at the two types of metaprogramming, we can conclude that reflection encompasses introspection, or introspection is a subset of reflection:
Interesting Uses of Metaprogramming
The general definition of metaprogramming also encompasses the tools used in a program's life cycle.
For instance, a language compiler is a metaprogramming application designed to receive another program as input and generate binary as output.
We can narrow down broad metaprogramming use cases to the following, more specific, applications in a programming language context:
Code generation
By generating code dynamically during compile-time, it's available during run-time. When the nature of the code that's generated is not fixed, this can prove especially useful. For instance, you can use code generation to design domain-specific languages (DSLs) or generate functions based on input such as files or APIs.
Code instrumentation
Code instrumentation refers to the measure of a program's performance, error diagnosis, and logging of trace information.
Metaprogramming enables this through dynamic program analysis — software analysis performed by running software through a real or virtual processor.
Code instrumentation enables features like code coverage, memory error detection, fault localization, and concurrency errors.
Behavioral changes
This refers to changing the behavior of a program through metaprogramming. Behavioral changes can include feature toggling, where a given feature is toggled on/off through a flag that is read during compile-time/run-time.
This article series is about metaprogramming within Elixir, so our key focus will be on code generation.
Metaprogramming in Elixir: The Basics
Elixir applies a style of metaprogramming known as macro system metaprogramming (also used in other languages like Rust and Lisp).
In Elixir, metaprogramming allows developers to leverage existing features to build new features that suit their individual business requirements.
The foundation of metaprogramming in Elixir is macros.
Defining Macros
According to the official macros documentation:
Macros are compile-time constructs that are invoked with Elixir's AST as input and a superset of Elixir's AST as output.
There are two critical components to this definition. Let's break them down:
- Compile-time constructs - evaluated and available during compile-time
- Elixir's AST - Abstract Syntax Trees (ASTs) are tree representations of the abstract syntax structure of the source code
We use the representations of the source code as building blocks for compile-time constructs. Since the compiler reasons with the source code through ASTs, we effectively "speak" the compiler's language to build constructs that it can directly reason with.
In Elixir, ASTs are tuples, so we reason with the compiler in a manner that is familiar to us. We do not need to deviate from Elixir's syntax to begin writing macros, which lowers our barrier to entry of learning macros. On top of that, we do not even need to write ASTs ourselves. There are constructs in Elixir to handle all of that heavy lifting for us.
The above definition also mentions how a macro receives an AST as input and returns a superset of AST as output. So, you can think of a macro as a regular function with inputs, behavior, and an output. The overall goal is to use a given AST to generate a new AST for the compiler to use.
There is more to come on the compilation process of Elixir programs in part two of this series.
Starting Small with Macros
Now that we understand macros, let's dip our toes into the water and implement a basic macro.
We'll start with a very basic comparison of a macro to a regular function.
The Elixir documentation inspires this code example:
defmodule Foo do defmacro macro_inspect(value) do IO.inspect(value) value end def func_inspect(value) do IO.inspect(value) value end end
To define a macro, we use defmacro
and declare the parameters just as we would a regular function.
Running the macro in IEX yields the following results:
iex(1)> import Foo iex(2)> macro_inspect(1 + 2) {:+, [context: Elixir, import: Kernel], [1, 2]} 3 iex(3)> func_inspect(1 + 2) 3 3
Observe that rather than printing the result of 1 + 2
, the macro prints a tuple instead (the AST as input that we defined earlier).
When a macro is first declared, the arguments of that macro are automatically converted into AST so that you don't need to parse the arguments manually. The arguments will not be evaluated beforehand.
However, when the value of the macro is returned, it yields the result of 1 + 2
. The macro should return an AST as output (and it is). However, this AST as output is compiled and executed once the macro is called. The expression 1 + 2
is evaluated first, then returned.
Once we understand the basic syntax and declaration of a macro, we can explore the structure of the AST.
AST Structure
As mentioned earlier, the AST is the representation of the source code as a syntax tree. In the example above, we inspect the AST of the expression 1 + 2
.
We can break down the AST structure into three components:
- Atom — representing the name of the operation
- Metadata of the expression
- Arguments of the operation
{ :+, # operation name, [context: Elixir, import: Kernel], # metadata, [1, 2] # operation arguments }
While you must understand what comprises an AST, we rarely need to read/write raw ASTs.
Elixir makes it ridiculously easy to interface with macros, so we hardly even need to think about the structure of the AST that we are working on — everything is handled for us.
Interacting with ASTs
As mentioned earlier, ASTs represent the source code and are the input and output of macros. They are the cornerstone of macros. We need to interact with the AST representations of expressions freely, without getting bogged down by reading and writing the ASTs ourselves.
This is where quote
and unquote
come into the picture.
To generate the AST representation of an expression or body, we use quote
:
quote do 1 + 2 * 3 end {:+, [context: Elixir, import: Kernel], [1, {:*, [context: Elixir, import: Kernel], [2, 3]}]}
When we use quote
, we build an AST. While the example above is relatively simple, we will soon discover that quote
can be used to build much more complex ASTs.
What if we have a value we want to use in our quote
, such as the arguments? We attempt to introduce an external (outside of quote
) variable into quote
, by using unquote
.
unquote
evaluates its argument, which is an expression, and injects the result (as an AST) into the AST being built. As
everything in Elixir is an expression, we evaluate expressions to inject the results.
For instance, if unquote
receives a variable, we will evaluate that expression as the underlying expression referenced by the variable and inject that.
If unquote
receives a full expression like
1 + 2 * 3
, we will evaluate that to 7
and inject that. unquote
expects that the result of the expression is a valid AST.
In part two of this series, we'll discuss the consequences of having an invalid AST and delve into macros more deeply.
Do you recall that macros automatically convert arguments into their AST forms? We will leverage that behavior:
defmodule Foo do defmacro foo(exp) do quote do doubled = unquote(exp) * 2 doubled end end end Foo.foo(1 + 2 * 3) 14
As you can see, we have built a macro called foo
which receives an expression as an argument. Then, we begin to build an AST for the macro in quote
. We use unquote(exp)
to inject the value of the exp
argument into the AST.
You might ask yourself: How do I know that the expression is injected and not evaluated right away?
Well, we can use a handy tool to inspect the AST of the macro and understand how it works under the hood:
iex(1)> require Foo iex(2)> ast = quote do: Foo.foo(1 + 2 * 3) iex(3)> ast |> Macro.expand(__ENV__) {:__block__, [], [ {:=, [], [ {:doubled, [counter: -576460752303423358], Foo}, {:*, [context: Foo, import: Kernel], [ {:+, [context: Elixir, import: Kernel], [1, {:*, [context: Elixir, import: Kernel], [2, 3]}]}, 2 ]} ]}, {:doubled, [counter: -576460752303423358], Foo} ]}
First, we generate the AST of the macro call and assign it to a variable.
Then, with our ast
variable, we will use
Macro.expand
to expand the AST to its fullest form.
We'll look at macro expansion next time. For now, think of it as peeling back the layers of an AST to its most fundamental components.
As you can see, the expanded form of the Foo.foo
call contains the AST of 1 + 2 * 3
. This proves that unquote
only injected the AST of the expression into the quote
AST, but didn't evaluate it. The evaluation is performed later on (we will get into this in part two as well).
Note: Macro.expand
will only attempt to perform expansion on the root node of the AST.
You can find more information about Macro.expand
in the docs.
quote
Options in Elixir
Now that we understand the fundamentals of macros, we can start to look at our quote
options.
While there are several options with quote
, we will focus on the three most frequently used and introduce the concepts behind each option.
unquote
Toggles the unquoting behavior in quote
. By disabling it, any unquote
call is converted to an AST of the macro call (as with any other macro/function call).
This defers the evaluation of unquote
to a later point. I'll explain why you'd want to do so in the next part of this series.
For now, let's look at the following example:
iex(1)> a = [foo: 1, bar: 1] iex(2)> ast = quote do: unquote(a) [foo: 1, bar: 1] iex(3)> ast = quote unquote: false, do: unquote(a) {:unquote, [], [{:a, [], Elixir}]}
When we leave the unquoting behavior enabled (iex(2)
), unquote(a)
will evaluate a
as an expression. This returns the keyword list, which is then injected into the quote
AST — and the result is as expected.
However, when we disable the unquoting behavior (iex(3)
), unquote(a)
is converted into another AST expression, which is injected into the quote
AST as-is.
bind_quoted
Disables unquoting behavior in the quote
and binds given variables in the body of quote
.
Binding moves the variable initialization into the body of quote
.
We can observe this behavior using Macro.to_string
:
iex(1)> a = [foo: 1, bar: 2] iex(2)> ast = quote bind_quoted: [a: a], do: IO.inspect(a) iex(3)> ast |> Macro.to_string |> IO.puts ( a = [foo: 1, bar: 2] IO.inspect(a) :ok ) :ok
As you can see, bind_quoted
adds a "copy" of a
into the body of quote
by assigning it in the body of quote
.
In a macro, this is equivalent to binding the variable to the caller context, as the variable is initialized during the evaluation of the callsite.
Note: Contexts will be discussed in greater detail next time.
defmodule Foo do defmacro foo(x) do quote bind_quoted: [x: x] do IO.inspect(x) end end end
location
This option controls whether run-time errors from a macro are reported from the caller or inside the quote.
By setting this option to :keep
, error messages report specific lines in the macro that cause the error, rather than the line of the callsite.
You can see a code example in the docs.
Build a Simple Macro in Elixir
We should now be able to build a simple macro that mimics the behavior of an if
statement.
Recall that an if
statement is comprised of the following components:
if (condition) do # body else # body end
We can replicate this structure using our own macro:
defmodule NewIf do defmacro if?(condition, do: block, else: other) do quote do cond do unquote(condition) == true -> unquote(block) unquote(condition) == false -> unquote(other) end end end end iex(1)> require NewIf iex(2)> NewIf.if? 4 == 5, do: :yes, else: :no :no
This macro can receive three arguments:
condition
- predicate to evaluateif?
statement againstdo
- block to execute whencondition
is trueelse
- block to execute whencondition
is false
In Elixir, such blocks can be declared as arguments if they follow the following syntax: <formal name>: <variable name>
. The formal name is the name used when you call the macro. The variable name is the name used in the macro when you're attempting to reference the block.
After receiving these three arguments, we start by building an AST using quote
.
Using a cond
statement, we determine which body if?
should execute. We use unquote
to inject the values of condition
, block
, and other
into the AST we are building.
In doing so, when the macro is evaluated, the condition is evaluated to be true
/false
, and, based on that result, we will either execute block
or other
.
We wrap up this behavior into an AST returned by quote
(which is the return value of the macro).
Next Up: Macros in Detail
Now we have a good grasp on the foundations of metaprogramming in general and specifically in Elixir.
Join me for the next part of this series, where we'll look into the intricacies behind macros and how everything works.
Until next time!
P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!