Whenever you run your code, you use memory. When you write in a language like Ruby, it seems like the memory available to you is infinite. You can just keep going without thinking about the fixed amount of memory the system running your code has. In this Ruby Magic episode we'll explain how this works!
A bit of history
Back in the day, scripting languages such as Ruby did not exist yet. People only wrote code in languages such as C, a low level programming language. One of the things that makes these languages low level is that you have to clean up after yourself. For example, whenever you allocate memory to store a String
, you also have to decide when to clean it up.
Manual cleanup
This looks a little something like the following mock Ruby code. It declares a variable and uses the method free
–this method does not actually exist in Ruby– to clean up the memory we've used after we're done with the variable.
ruby 1_000_000.times do |i| variable = "Variable #{i}" puts variable free(variable) end
A tedious way of programming
You might have already realized there's a risk here: what if you forget to free
the variable? In that case the content of that variable will just stick around in memory until the process exits. If you do this often enough, you will be out of memory and your process crashes.
The next example demonstrates another common issue:
ruby 1_000_000.times do |i| variable = "Variable #{i}" free(variable) puts variable end
We declare the variable and free
it. But then we try to use it again, which is impossible because it doesn't exist anymore. If this were C, your program would now crash with a segfault
. Oops!
Humans are mistake machines
Humans are notoriously bad at not making these kinds of mistakes all of the time. Hence the need for a way to automatically clean up memory. The most popular way to do this –also used in Ruby– is Garbage Collection (GC).
How Garbage Collection (GC) works
In a language that uses GC, you can create objects without manually cleaning them up. Whenever you create an object, it's registered with the Garbage Collector. GC tries to keep track of all references you make to this object. When it determines you're not using the object any more, it is marked for cleanup. Every once in a while the Garbage Collector pauses your program and cleans up all the marked objects.
Looking at some examples
In the simple loop we used earlier the GC's job is fairly easy. With every iteration of the loop, the variable isn't used anywhere anymore. The variable can immediately be marked for cleanup.
ruby 1_000_000.times do |i| variable = "Variable #{i}" puts variable end
In the next example we pass the variable into the puts_later
method which waits for 30 seconds and then puts
the variable.
ruby def puts_later(variable) Thread.new do sleep 30 puts variable end end 1_000_000.times do |i| variable = "Variable #{i}" puts_later variable end
The Garbage Collector's job is already pretty complicated in this relatively simple example. It has to understand that we reference the variable in the puts_later
method. Because the method starts a thread, the Garbage Collector has to keep track of the thread and wait for it to finish. Only then can the variable can be marked for cleanup.
When it gets complicated
Without getting into complex examples, trust me when I say the Garbage Collector's job is really hard. This also explains why GC can cause overhead and problems in your production environment. It needs to have a very detailed understanding of what's happening in your program to properly clear memory, which takes quite a few CPU cycles to get right. But hey, it beats cleaning up after yourself!
There's more to Garbage Collection
This was only our introduction to Garbage Collection. In a future article we'll look at how exactly this works in Ruby, and how you can measure and tune GC to improve the performance of your application.
Update: The next episode is available here .