Ruby has various ways of performing iteration—loops, blocks and enumerators. Most Ruby programmers are at least familiar with loops and blocks but Enumerator
and Fiber
often stay in the dark. In this edition of Ruby Magic, guest author Julik shines a light on Enumerable
and Fiber
to explain flow controlling enumerables and turning blocks inside out.
Suspending Blocks and Chained Iteration
We've discussed Enumerator in a previous edition of Ruby Magic, where we described how to return an Enumerator
from your own #each
method and what it can be used for. An even broader use case for Enumerator
and Fiber
is that they can "suspend a block" mid-flight. Not just the block given to #each
or the entire call to #each
, but any block!
This is a very powerful construct, which can be used to implement shims for methods that work by using blocks as a bridge to callers that expect sequential calls instead of taking a block. For example, imagine we want to open a database handle and read each item that we have retrieved:
The block API is great since it will potentially perform all kinds of cleanup for us when the block is terminated. However, some consumers might want to work with the database in this way:
In practice, it means we want to "suspend" the execution of the block "just for now" and carry on later within the block. Thus, the caller takes over the flow control instead of it being in the hands of the callee (the method performing the block).
Chaining Iterators
One of the most common uses of this pattern is chaining multiple iterators together. When we do so, the methods we are used to for iteration (like #each
), return an Enumerator object instead, which we can use to "grab" the values that the block sends us using the yield
statement:
The enumerators can then be chained which allows us to perform operations like "any iteration but with the index". In this example, we're calling #map
on a range to get an Enumerable
object. We then chain #with_index
to iterate over the range with an index:
This can be very useful, especially if your system uses events. Ruby provides a built-in method for wrapping any method with an Enumerator generator, which allows us to accomplish exactly this. Imagine we want to "pull" rows one by one from our with_each_row_of_result
, instead of the method yielding them to us.
If we were to implement this ourselves, this is how it would likely come about:
Turning Blocks Inside Out
Rails allows us to assign the response body to also be an Enumerator. It will call next
on the Enumerator we assign as the response body and expect the returned value to be a string—which will be written out into the Rack response. For example, we can return a call to the #each
method of a Range as a Rails response body:
This is what I call turning a block inside out. In essence, it is a control flow helper that allows us to "freeze time" in a block (or a loop, which is also a block in Ruby) mid-flight.
However, Enumerators have a limiting property that makes them slightly less useful. Imagine we want to do something like this:
Let's wrap it with an enumerator, and write into it
Everything works great. However, there is a hitch—how do we tell the enumerator that we are done writing, so that it can "finish" the block, close the file and exit? This will perform a number of important steps—for example, resource cleanup (the file will be closed), as well as ensuring all the buffered writes are flushed to disk. We do have access to the File
object, and we can close it ourselves, but we would like the enumerator to manage the closing for us; we have to let the enumerator proceed past the block.
Another hurdle is that sometimes we want to pass arguments of what is happening within the suspended block. Imagine we have a block-accepting method with the following semantics:
but in our calling code we want to use it like this:
Ideally, we would wrap our method call into some structure that would permit us the following trick:
What if we were to wrap our writes like this?
In this case, we will use the :terminate
as a magic value that will tell our method that it can finish the block and return. This is where Enumerator
won't really help us because we can't pass any arguments to Enumerator#next
. If we could, we would be able to do:
Enter Ruby's Fibers
This is exactly what Fibers permit. A Fiber allows you to accept arguments on each reentry, so we can implement our wrapper like so:
This is how it works: When you first call .resume
on your deferred_writable
, it enters the fiber and goes all the way to the first Fiber.yield
statement or to the end of the outermost Fiber block, whichever comes first. When you call Fiber.yield
, it gives you back control. Remember the Enumerator? The block is going to be suspended, and the next time you call .resume
, the argument to resume
becomes the new data_to_write
.
So, within the Fiber, the code flow is started on the first call to Fiber#resume
, suspended at the first call to Fiber.yield
, and then continued on subsequent calls to Fiber#resume
, with the return value of Fiber.yield
being the arguments to resume
. The code continues running from the point where Fiber.yield
was last called.
This is a bit of a quirk of Fibers in that the initial arguments to the fiber will be passed to you as the block arguments, not via the return value of Fiber.yield
.
With that in mind, we know that by passing a special argument to resume
, we can decide within the Fiber whether we should stop or not. Let's try that:
There are a number of situations where these facilities can be very useful. Since a Fiber contains a suspended block of code that can be manually resumed, Fibers can be used for implementing event reactors and for dealing with concurrent operations within a single thread. They are lightweight, so you can implement a server using Fibers by assigning a single client to a single Fiber and switching between these Fiber objects as necessary.
Ruby has an additional standard library called fiber
which allows you to explicitly transfer control from one fiber to another, which can be a bonus facility for these uses.
Controlling Data Emission Rates
Another great use for fibers and enumerators can arise when you want to be able to control the rate at which a Ruby block emits data. For example, in zip_tricks we support the following block use as the primary way of using the library:
We therefore allow "push" control on the part of the code that creates the ZIP archive, and it is impossible to control how much data it outputs and how often. If we want to write our ZIP in chunks of, say, 5 MB—which would be a limitation on AWS S3 object storage—we would have to create a custom output_io
object which would somehow "refuse" to accept <<
method calls when the segment needs to be split off into an S3 multipart part. We can, however, invert the control and make it "pull". We will still use the same block for writing our big CSV file, but we will be resuming and halting it based on the output it provides. We therefore make the following use possible:
This allows us to control at which rate our ZIP file generator emits data.
Enumerator and Fiber are, therefore, a control flow mechanism for turning "push" blocks into "pull" objects that accept method calls.
There is only one pitfall with Fibers and Enumerators—if you have something like ensure
in your block, or something that needs to be done after the block completes, it is now up to the caller to call you enough times. In a way, it is comparable to the constraints you have when using Promises in JavaScript.
Conclusion
This concludes our look into flow-controlled enumerables in Ruby. Along the way, Julik shone light on the similarities and differences between the Enumerable
and Fiber
classes, and dove into examples where caller determined the flow of data. We’ve also learned about Fiber
’s additional magic to allow passing arguments on each block reentry. Happy flow-controlling!
To get a steady dose of magic, subscribe to Ruby Magic and we'll deliver our monthly edition straight to your inbox.