Parsing Numbers in Elixir

Like any modern programming language, Elixir has built-in tools for doing basic tasks, such as parsing numbers from strings. Although they're built-in and ready to use, it's useful to understand the underlying algorithms.

In this post we'll first explain how to convert strings to integers in Elixir. This will be quick and useful. After that we'll go straight down the rabbit hole and explain the underlying algorithms. This is the Alchemy we love. It may help you implement a parser for something similar, but it will definitely satisfy your curiosity about the mathematical ideas behind parsing numbers in Elixir.

The Quick (and Boring) Built-in

In Elixir, you can convert strings to floating point numbers using Float.parse/1:

Elixir

iex> Float.parse("1.2")
{1.2, ""}

This returns a tuple, where the first element is the parsed number, and the second is whatever was left of your input string once a non-numeric character was found. This is useful if you’re unsure whether your input contains additional data:

Elixir

iex> Float.parse("3 stroopwafels")
{3.0, " stroopwafels"}
 
iex> Float.parse("stroopwafels? 3, please")
:error # This fails because the number needs to be at the beginning of the string

If you’re sure your input is a well-formatted floating point number with no additional characters, you can use a more direct approach:

Elixir

iex> String.parse_float("1.2")
1.2
 
iex> String.parse_float("3 stroopwafels")
** (ArgumentError) argument error
    :erlang.binary_to_float("3 stroopwafels")

To satisfy our technical curiosity, let's dive in and see how this works internally. We won't implement everything that's required for reliably parsing floats and integers, but we'll learn enough to understand the fundamentals.

Down the Rabbit Hole

One way of thinking about number parsing is by decomposing a number into multiple components:

Elixir

1234 = 1000 + 200 + 30 + 4

Using this knowledge, we can use a divide-and-conquer strategy to parse the number, by parsing each of its digits individually. Elixir’s pattern matching and recursive capabilities also fit in nicely here.

Parsing a Single Digit

Let’s start with a single digit integer for demonstration purposes.

Elixir

defmodule Parser do
  def ascii_to_digit(ascii) when ascii >= 48 and ascii < 58 do
    ascii - 48
  end
  def ascii_to_digit(_), do: raise ArgumentError
end

The ascii_to_digit/1 function expects the ASCII code of a single digit and returns the corresponding integer. This should only work for actual numeric characters, which are in the 48 to 57 range of the ASCII table. Any other value will cause an exception to be raised.

Knowing the fact that numerical digits are declared sequentially in the ASCII table, we can simply subtract 48 to get the actual numeric value. This function will be a useful helper in the next section.

Parsing an Entire Integer

Now let’s add a function to handle an entire string containing an integer:

Elixir

defmodule Parser do
  def parse_int(str) do
    str
    |> String.reverse()
    |> do_parse_int(0, [])
  end
 
  def do_parse_int(<<char :: utf8>> <> rest, index, cache) do
    new_part = ascii_to_digit(char) * round(:math.pow(10, index))
 
    do_parse_int(
      rest,
      index + 1,
      [new_part | cache]
    )
  end
  def do_parse_int("", _, cache) do
    cache
    |> Enum.reduce(0, &Kernel.+/2)
  end
 
  # ...
end

Here, the do_parse_int/3 function traverses the string, using two auxiliary arguments: a counter that increments with every new digit, giving us our current index in the traversal, and an array where we keep intermediary values.

Also, notice that we’re first reversing the string. This is because Elixir’s pattern matching only allows us to match the beginning of a string, not the end. We want to start from the least significant digit, which is at the right end of a number. So we first reverse the string, then make the traversal from left-to-right.

For each, digit, we’re multiplying it with 10^index. This means that for the string "1234" we end up with the following array:

Elixir

[1000, 200, 30, 4]

All that's left is to sum up all the elements, which is done once we match an empty string.

Note: An optimized version of this could run the Enum.reduce call on the original characters of a string, summing the digits right away instead of keeping a temporary list. We didn't do this here so that we could split the responsibilities a bit better and leave the code more readable.

Parsing Floats

To implement a parse_float/1 function, all that's left is to handle decimals. Fortunately, we can reuse our existing parse_int/1 function, along with a couple of fancy tricks to make everything work:

Elixir

defmodule Parser do
  @float_regex ~r/^(?<int>\d+)(\.(?<dec>\d+))?$/
 
  def parse_float(str) do
    %{"int" => int_str, "dec" => decimal_str} = Regex.named_captures(@float_regex, str)
 
    decimal_length = String.length(decimal_str)
 
    parse_int(int_str) + parse_int(decimal_str) * :math.pow(10, -decimal_length)
  end
end

We define a @float_regex module variable, which holds a regular expression capable of capturing both the left and right side of a floating point number. The decimal separator, and it’s subsequent digits, are optional, so this regex will match "123" just as well as it matches "123.456".

Explaining the details of this regex is out of the scope of this article, but feel free to play around with it in your Elixir console.

When we run the regex, against our input, say "123.456", we end up with the following map:

Elixir

%{
  "int" => "123"
  "dec" => "456"
}

We can now see where parse_int/1 comes in handy. It can be used on both parts to get 123 and 456, respectively. But how can we combine them to have the desired 123.456 as a result?

Again, math comes to our rescue. Multiplying the decimal part by 10^-3, where 3 is the length, gives us 0.456, which we can them add to the integer part to get the final result.

Our end result can parse integers and floats from strings.

Elixir

defmodule Parser do
  def parse_int(str) do
    str
    |> String.reverse()
    |> do_parse_int(0, [])
  end
 
  def do_parse_int(<<char::utf8>> <> rest, index, cache) do
    new_part = ascii_to_digit(char) * round(:math.pow(10, index))
 
    do_parse_int(
      rest,
      index + 1,
      [new_part | cache]
    )
  end
  def do_parse_int("", _, cache) do
    cache
    |> Enum.reduce(0, &Kernel.+/2)
  end
 
  @float_regex ~r/^(?<int>\d+)(\.(?<dec>\d+))?$/
 
  def parse_float(str) do
    %{"int" => int_str, "dec" => decimal_str} = Regex.named_captures(@float_regex, str)
 
    decimal_length = String.length(decimal_str)
 
    parse_int(int_str) + parse_int(decimal_str) * :math.pow(10, -decimal_length)
  end
 
  def ascii_to_digit(ascii) when ascii >= 48 and ascii < 58 do
    ascii - 48
  end
  def ascii_to_digit(_), do: raise(ArgumentError)
end

Unhandled Cases

This was a somewhat summarized demonstration of how a low-level number parser could work in Elixir. However, it does not cover every possible scenario one might want. Some things that weren't covered are:

Support for negative numbers
Support for scientific notation (e.g. 1.23e7)
More graceful error handling. If you’re building a parser for your own use-case, then error handling should also depend on what the use case is, as well as its conditions. Hence, it was not covered here.
Handling more numerical systems. Did you notice we’re using :math.pow(10,x) in a few places? Making that 10 configurable should allow us to support binary, octal or hexadecimal strings.

We'd love to know what you thought of this article, or if you have any questions. We’re always on the lookout for topics to investigate and explain, so if there’s anything in Elixir you’d like to read about, don't hesitate to let us know at @AppSignal!