A memory leak is an unintentional, uncontrolled, and unending increase in memory usage. No matter how small, eventually, a leak will cause your process to run out of memory and crash. Even if you periodically restart your app to avoid this crash (no judgment, I've done that!), you still suffer the performance implications of a memory leak.
In this post, the first of a two-part series on memory leaks, we'll start by looking at how Ruby manages memory, how Garbage Collection (GC) works, and how to find a leak.
In the second part, we'll take a deeper dive into tracking down leaks.
Let's get started!
Ruby Memory Management
Ruby objects are stored on the heap, and each object fills one slot on the heap.
Prior to Ruby 3.1, all slots on the heap were the same size — 40 bytes, to be exact. Objects too large to fit in a slot were stored outside the heap. Each slot included a reference to where objects were moved.
In Ruby 3.1,
variable width allocation for String
objects was merged. Soon, variable width allocation will be the norm for all
object types.
Variable width allocation aims to improve performance by improving cache locality — all the information of an object will be stored in one place rather than across two memory locations.
It should also simplify (some parts) of memory management. At the moment, there are two 'heaps':
- The Ruby heap (or GC heap) that stores smaller Ruby objects.
- The C heap (or malloc/transient heap) that stores larger objects.
Once variable width allocation is the norm, there should be no need for the latter heap.
The heap starts at a given size (10,000 slots by default) and objects are assigned to free slots as they are created. When Ruby tries to create an object and there are no free slots available, Garbage Collection (GC) occurs to make some free slots available.
If there are too few free slots after GC, the heap will be expanded (more on this a little later).
Here are the factors you can control, alongside their environment variables:
- Initial size of the heap -
RUBY_GC_HEAP_INIT_SLOTS
- Number of free slots that should be available after GC occurs -
RUBY_GC_HEAP_FREE_SLOTS
- Amount the heap is expanded by -
RUBY_GC_HEAP_GROWTH_FACTOR
Garbage Collection in Ruby
Garbage Collection in Ruby 'stops the world' — no other process occurs when GC occurs. Garbage Collection in Ruby (since 2.1) is also generational, meaning that the garbage collector has two modes:
- Minor GC - inspects 'young' objects (objects created recently)
- Major GC - inspects 'old' objects as well as 'young' objects (all the objects)
Note: An 'old' object has survived 3
GC runs, major or minor.
When the heap is full, minor GC is invoked first. If it can't free up enough slots to be below the limit, major GC will be invoked. Only then, if there are still not enough free slots, will the heap be expanded.
Major GC is more expensive than minor GC because it looks at more objects.
The theory behind why generational GC is more performant is that objects usually fall into two categories:
- Objects that are allocated and then quickly go out of scope. In a Rails app, models fetched from the DB to render a page will go out of scope when the request ends.
- Objects that are allocated and kept around for a long time. Classes and caches are likely to still be in use throughout the lifetime of an app.
Major GC will also run after minor GC if the number of old objects is above a
certain threshold, even if there are sufficient free slots. This limit
increases as the size of the heap grows and can be controlled by the
RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR
environment variable.
When you have a leak, you create objects that can't be cleaned up — more and more old
objects. This means that major (expensive) GC
will run much more often than it should. Since nothing else runs when GC is running, this is time that you waste.
I've left some links at the end of this article for further reading on memory layout and the garbage collector in Ruby.
What Does A Memory Leak Look Like in Ruby?
You can see a memory leak using simple tools available on any Unix system. Take the following code as an example.
To say this code 'leaks' is a little unfair — all it does is leak! — but it serves our purposes.
We can observe the leak quite simply from the command line by running this
program in one terminal and watch
-ing the memory increase over time with
ps
.
The pgrep -f "ruby ./leaky.rb"
finds the process ID for us, so that we can restrict the
ps
output to only the process we're interested in. As you may be able to
guess, it's like grep
for processes.
The watch
tool allows us to poll the
output of a given command and update it in place, giving us a live dashboard
within our terminal.
You'll get output like this, which updates every couple of seconds.
You should see the %MEM
and RSS
increasing. They are:
%MEM
- The amount of memory the process uses as a percentage of memory on the host machine.RSS
(resident set size) - The amount of RAM the process uses in bytes.
This basic OS-only information is enough to spot if you have a leak — if the memory keeps going up, it means you do!
Find Ruby Leaks with the Garbage Collector Module
We can also detect leaks within Ruby code itself with the GC
module.
The GC.stat
method will return a hash with a lot of useful information. Here,
we're interested in :heap_live_slots
, which is the number of slots on the
heap that are in use. That's the opposite of :heap_free_slots
.
At the end of the loop, we force a major GC and print out the number of used
slots, i.e., the number of objects that remain after GC.
When we run our little program, we see this increase ad infinitum.
We have a leak! We could also have used GC.stat(:old_objects)
to the same
effect.
While the GC
module can be used to see if we have a leak and (if you're
smart with your puts
statements) where the leak might be occurring, we can see
the type of objects that might be leaking with the ObjectSpace
module.
The ObjectSpace.count_objects
method returns a hash with the counts of
live objects. T_STRING
, for instance, is the number of strings live in
memory. For our rather leaky program, this value increases with each loop,
even after GC. We can see that we are leaking string objects.
Application Performance Monitoring in Production with AppSignal
While playing with ps
and GC
can be a sensible route for toy projects —
they're also fun and informative to use! — I would not recommend them as your
memory leak detection solution in production apps.
This is where you would use an Application Performance Monitoring (APM) tool. If you're a very large company, you can build these yourself. For smaller outfits, though, picking an APM off-the-shelf is the way to go. You do need to pay a monthly subscription, but the information they provide more than makes up for it.
For detecting memory leaks, you want to find server or process memory use (sometimes called RSS) graphs over time. Here's an example screenshot from AppSignal's 'process memory usage' dashboard of a healthy app shortly after being deployed:
And here's an unhealthy app after deployment:
AppSignal will even surface Ruby VM stats like GC and heap slots, which can give you an even clearer signal for a memory leak. If the number of live slots keeps growing, you have a leak!
Read more about AppSignal for Ruby.
Wrap Up and Further Reading
In this post, we took a quick tour of Ruby's memory management and garbage collector. We then diagnosed how to discover a memory leak using Unix tools and Ruby's GC module.
Next time, we'll see how to use memory_profiler
and derailed_benchmarks
to find and fix leaks.
In the meantime, you can read more about the tools we used:
Additional further reading:
GC
module documentationObjectSpace
module documentation- Garbage Collection Deep Dive
- Variable Width Allocation
Happy coding, and see you next time!
P.S. If you'd like to read Ruby Magic posts as soon as they get off the press, subscribe to our Ruby Magic newsletter and never miss a single post!