Like any modern programming language, Elixir has built-in tools for doing basic tasks, such as parsing numbers from strings. Although they're built-in and ready to use, it's useful to understand the underlying algorithms.
In this post we'll first explain how to convert strings to integers in Elixir. This will be quick and useful. After that we'll go straight down the rabbit hole and explain the underlying algorithms. This is the Alchemy we love. It may help you implement a parser for something similar, but it will definitely satisfy your curiosity about the mathematical ideas behind parsing numbers in Elixir.
The Quick (and Boring) Built-in
In Elixir, you can convert strings to floating point numbers using Float.parse/1
:
This returns a tuple, where the first element is the parsed number, and the second is whatever was left of your input string once a non-numeric character was found. This is useful if you’re unsure whether your input contains additional data:
If you’re sure your input is a well-formatted floating point number with no additional characters, you can use a more direct approach:
To satisfy our technical curiosity, let's dive in and see how this works internally. We won't implement everything that's required for reliably parsing floats and integers, but we'll learn enough to understand the fundamentals.
Down the Rabbit Hole
One way of thinking about number parsing is by decomposing a number into multiple components:
Using this knowledge, we can use a divide-and-conquer strategy to parse the number, by parsing each of its digits individually. Elixir’s pattern matching and recursive capabilities also fit in nicely here.
Parsing a Single Digit
Let’s start with a single digit integer for demonstration purposes.
The ascii_to_digit/1
function expects the ASCII code of a single digit and returns the corresponding integer. This should only work for actual numeric characters, which are in the 48 to 57 range of the ASCII table. Any other value will cause an exception to be raised.
Knowing the fact that numerical digits are declared sequentially in the ASCII table, we can simply subtract 48 to get the actual numeric value. This function will be a useful helper in the next section.
Parsing an Entire Integer
Now let’s add a function to handle an entire string containing an integer:
Here, the do_parse_int/3
function traverses the string, using two auxiliary arguments: a counter that increments with every new digit, giving us our current index in the traversal, and an array where we keep intermediary values.
Also, notice that we’re first reversing the string. This is because Elixir’s pattern matching only allows us to match the beginning of a string, not the end. We want to start from the least significant digit, which is at the right end of a number. So we first reverse the string, then make the traversal from left-to-right.
For each, digit, we’re multiplying it with 10^index
. This means that for the string "1234"
we end up with the following array:
All that's left is to sum up all the elements, which is done once we match an empty string.
Note: An optimized version of this could run the Enum.reduce
call on the original characters of a string, summing the digits right away instead of keeping a temporary list. We didn't do this here so that we could split the responsibilities a bit better and leave the code more readable.
Parsing Floats
To implement a parse_float/1
function, all that's left is to handle decimals. Fortunately, we can reuse our existing parse_int/1
function, along with a couple of fancy tricks to make everything work:
We define a @float_regex
module variable, which holds a regular expression capable of capturing both the left and right side of a floating point number. The decimal separator, and it’s subsequent digits, are optional, so this regex will match "123"
just as well as it matches "123.456"
.
Explaining the details of this regex is out of the scope of this article, but feel free to play around with it in your Elixir console.
When we run the regex, against our input, say "123.456"
, we end up with the following map:
We can now see where parse_int/1
comes in handy. It can be used on both parts to get 123
and 456
, respectively. But how can we combine them to have the desired 123.456
as a result?
Again, math comes to our rescue. Multiplying the decimal part by 10^-3
, where 3
is the length, gives us 0.456
, which we can them add to the integer part to get the final result.
Our end result can parse integers and floats from strings.
Unhandled Cases
This was a somewhat summarized demonstration of how a low-level number parser could work in Elixir. However, it does not cover every possible scenario one might want. Some things that weren't covered are:
- Support for negative numbers
- Support for scientific notation (e.g.
1.23e7
) - More graceful error handling. If you’re building a parser for your own use-case, then error handling should also depend on what the use case is, as well as its conditions. Hence, it was not covered here.
- Handling more numerical systems. Did you notice we’re using
:math.pow(10,x)
in a few places? Making that10
configurable should allow us to support binary, octal or hexadecimal strings.
We'd love to know what you thought of this article, or if you have any questions. We’re always on the lookout for topics to investigate and explain, so if there’s anything in Elixir you’d like to read about, don't hesitate to let us know at @AppSignal!