In this article, we’re going to dive into object marshalling. We'll explain what it is, look at the Marshall module, and then go through an example. We'll then go a step deeper and compare the _dump
and self._load
methods. Let's go!
What’s Object Marshalling?
When you are writing code, you might want to save an object and transmit it to another program or reuse it in your next program execution. Object marshalling is used in Sidekiq, for example; when a Sidekiq job is enqueued in a Ruby on Rails application, then a serialization of this job — which is nothing more than an object — is inserted in Redis. The Sidekiq process is then able to deserialize this JSON and reconstitute the original job from the JSON.
In computer programming, this process of serialization and deserialization of an object is what we commonly call object marshalling. Now, let’s look at what Ruby natively provides to handle Object Marshalling.
The Marshal Module
As Ruby is a fully object oriented programming language, it provides a way to serialize and store objects using the Marshall
module in its standard library. It allows you to serialize an object to a byte stream that can be stored and deserialized in another Ruby process.
So, let’s serialize a string and take a closer look at the serialized object.
hello_world = 'hello world!' serialized_string = Marshal.dump(hello_world) # => "\x04\bI\"\x11hello world!\x06:\x06ET" serialized_string.class # => String deserialized_hello_world = Marshal.load(serialized_string) # => "hello world!" hello_world.object_id # => 70204420126020 deserialized_hello_world.object_id # => 70204419825700
We then call the Marshal.dump
module method to serialize our string. We store the return value—which contains our serialized string—in the serialized_string
variable. This string can be stored in a file and the file can be reused to reconstitute the original object in another process. We then call the Marshal.load
method to reconstitute the original object from the byte stream.
We can see that this freshly reconstituted string has a different object_id
than the hello_world
string, which means it's a different object, but it contains the same data.
Pretty cool! But how is the Marshal
module able to reconstruct the string? And, what if I want to have control over which attributes to serialize and deserialize?
A Concrete Example of Object Marshalling
To answer these questions, let’s implement a marshalling strategy on a custom struct named User
.
User = Struct.new(:fullname, :age, :roles) user = User.new('Mehdi Farsi', 42, [:admin, :operator])
The User
struct defines 3 attributes: fullname
, age
, and roles
. For this example we have a business rule where we only serialize when it matches the following criteria:
- The
fullname
contains less than 64 characters - The
roles
array does not contain the:admin
role
To do so, we can define a User#marshal_dump
method to implement our custom serialization strategy. This method will be called when we invoke the Marshal.dump
method with an instance of User
struct as parameter. Let’s define this method:
User = Struct.new(:age, :fullname, :roles) do def marshal_dump {}.tap do |result| result[:age] = age result[:fullname] = fullname if fullname.size <= 64 result[:roles] = roles unless roles.include? :admin end end end user = User.new(42, 'Mehdi Farsi', [:admin, :operator]) user_dump = Marshal.dump(user) # 'in User#marshal_dump' user_dump # => "\x04\bU:\tUser{\a:\bageI\"\x10Mehdi Farsi\x06:\x06ET:\rfullnamei/"
In the above example, we can see that our User#marshal_dump
method is called when we invoke Marshal.dump(user). The user_dump
variable contains the string which is the serialization of our User
instance.
Now that we have our dump, let’s deserialize it to reconstitute our user. To do so, we define a User#marshal_load
method which is in charge of implementing the deserialization strategy of a User
dump.
So let’s define this method.
User = Struct.new(:age, :fullname, :roles) do def marshal_dump {}.tap do |result| result[:age] = age result[:fullname] = fullname if fullname.size <= 64 result[:roles] = roles unless roles.include? :admin end end def marshal_load(serialized_user) self.age = serialized_user[:age] self.fullname = serialized_user[:fullname] self.roles = serialized_user[:roles] || [] end end user = User.new(42, 'Mehdi Farsi', [:admin, :operator]) user_dump = Marshal.dump(user) # 'in User#marshal_dump' user_dump # => "\x04\bU:\tUser{\a:\bagei/:\rfullnameI\"\x10Mehdi Farsi\x06:\x06ET" original_user = Marshal.load(user_dump) # 'in User#marshal_load' original_user # => #<struct User age=42, fullname="Mehdi Farsi", roles=[]>
In the above example, we can see that our User#marshal_load method
is called when we invoke Marshal.load(user_dump)
. The original_user
variable contains a struct which is a reconstitution of our user instance.
Note that the original_user.roles
is not similar to the user.roles
array since during the serialization, user.roles
included the :admin
role. So the user.roles
wasn’t serialized into the user_dump
variable.
The _dump and self._load Methods
When Marshal.dump
and Marshal.load
are invoked, these methods call the marshal_dump
and the marshal_load
methods on the object passed as the parameter of these methods.
But, what if I tell you that the Marshal.dump
and the Marshal.load
methods try to call two other methods named _dump
and self._load
on the object passed as parameter?
The _dump Method
The differences between the marshal_dump
and the _dump
methods are:
- you need to handle the serialization strategy at a lower level when using the
_dump
method — you need to return a string that represents the data to serialize - the
marshal_dump
method takes precedence over_dump
if both are defined
Let’s have a look to the following example:
User = Struct.new(:age, :fullname, :roles) do def _dump level [age, fullname].join(':') end end user = User.new(42, 'Mehdi Farsi', [:admin, :operator]) Marshal.dump(user) # => "\x04\bIu:\tUser\x1342:Mehdi Farsi\x06:\x06EF"
In the User#_dump
method, we have to instantiate and return the serialization object — the string that represents your serialization.
In the following example, we define User#marshal_dump
and User#_dump
methods and return a string to see which method is called
User = Struct.new(:age, :fullname, :roles) do def marshal_dump 'in User#marshal_dump' end def _dump level 'in User#_dump' end end user = User.new(42, 'Mehdi Farsi', [:admin, :operator]) user_dump = Marshal.dump(user) # "in User#marshal_dump"
We can see that only the User#marshal_dump
is called even though they’re both defined.
The self._load Method
Now, let's look at the marshal_load
and _load
methods.
The differences between the marshal_load
and the _load
methods are:
- You need to handle the deserialization strategy at a lower level when using the
_load
method — You are in charge of instantiating the original object. - The
marshal_load
method takes a deserialized object as an argument when the_self.load
method takes the serialized string as an argument. - The
marshal_load
method is an instance method when theself._load
is a class method.
Let’s take a look at the following example:
User = Struct.new(:age, :fullname, :roles) do def _dump level [age, fullname].join(':') end def self._load serialized_user user_info = serialized_user.split(':') new(*user_info, Array.new) end end user = User.new(42, 'Mehdi Farsi', [:admin, :operator]) user_dump = Marshal.dump(user) user_dump # => "\x04\bIu:\tUser\x1342:Mehdi Farsi\x06:\x06EF" original_user = Marshal.load(user_dump) original_user # => #<struct User age="Mehdi Farsi", fullname=42, roles=[]>
In the User._load
method:
- we deserialize the string returned by the
User#_dump
method - we instantiate a new
User
by passing the deserialized information
We can see that we are in charge of allocating and instantiating the object used to reconstitute our original user.
So the Marshal.load
coupled to marshal_load
takes care of instantiating the reconstituted original object. Then it calls the marshal_load
method with the serialized object passed as argument on the freshly instantiated object.
On the contrary, a call to Marshal.load
coupled to _load
lets the self._load
class method be in charge of:
- deserializing the data returned by the
_dump
method - instantiating the reconstituted original object
Conclusion
Depending on your needs, you can decide to implement a higher or lower serialization/deserialization strategy. To do so, you can use the Marshal module coupled to the appropriate Marshal hook methods.
Voilà!