Making Hash
Smarter
One of the (few) nice things about Javascript over Ruby is its handling of the Object
data type (and by extension, JSON), where you can define a hashmap and access its values via standard dot-notation:
var foo = {};
foo["bar"] = "baz";
foo.bar;
// "baz"
If you tried to do this in Ruby, you’d have big problems:
foo = {}
foo["bar"] = "baz"
foo.bar
# NoMethodError
foo["bar"]
#=> "baz"
Notice instead that you have to use square brackets to access the same attribute. This is fine, except for the fact that the first method is much more convenient when you have some complicated hash you’re working with. For instance, if you’re dealing with a complicated JSON payload, it’s a lot easier to use dot-notation than Ruby hash syntax.
OpenStruct
Fortunately, Ruby does offer an alternative solution in OpenStruct
, which stuffs all the data into attributes on the class and provides a similar syntax. You can even parse JSON directly into an OpenStruct
(I commonly do this if I’m testing JSON responses from fixtures). There are two downsides to this, though:
- It’s relatively slow. We have to instantiate not only the original hash, but also the
OpenStruct
as well. OpenStruct
overrides existingHash
instance methods, and instead returnsnil
unless the key we’re looking up by is explicitly present.
The first is an obvious issue, but it can be mitigated if you only use it sparingly.
The second is larger. Imagine we have a hash like this, and turn it into an OpenStruct
:
hash = {
keys: "OVERRIDDEN!",
foo: :bar,
baz: :bat
}
ostruct = OpenStruct.new(hash)
ostruct.keys
#=> "OVERRIDDEN!"
ostruct.size
#=> nil
In the first example, we’ve overridden a Hash
instance method that should return all the keys in the hash (:keys
, :foo
, :bar
). And in the second example, we haven’t overridden anything, but we get an unexpected result (Hash#size
should return the number of keys in the hash, in this case 3
).
Enter Metaprogramming
In order to make our hashes smarter, we need a couple things, then:
- Hash keys as methods
- Fast instantiation
- Instance methods still available
We’re going to be looking at source code from intellihash
, the gem I created to make better hashes.
In order to do this, we’ll be overriding method_missing
for the Hash
class. If you haven’t played with method_missing
, it is evoked every time a message is sent to a Ruby object where no corresponding method can be found. This method determines what the object does when the method can’t be found. In most cases, this is to raise a NoMethodError
, as you saw when we tried to access a Ruby hash’s attributes via dot-notation. This is exactly why we’re going to override it to give us the new functionality we need!
method_missing
Here’s the guts of intellihash
:
# lib/mixins.rb
def method_missing(method_name, *args, **kwargs, &block)
super unless respond_to?(:is_intelligent?) && is_intelligent?
if method_name[-1] == '='
send(:store, method_name[0, method_name.size - 1].send(key_store_as), args.first)
else
format_method = key_retrieve_from(kwargs)
case format_method
when :any then fetch_where_present(method_name)
else send(:[], method_name.send(format_method))
end
end
end
There’s a lot going on. Most of this is related to configuration. Let’s go line-by-line.
super unless respond_to?(:is_intelligent?) && is_intelligent?
We can selectively turn off the intelligent
attribute on our hash instance to stop using intellihash
features, and this ensures we call the original method from Hash
if we aren’t dealing with an intelligent hash. Side note: This module is prepended instead of included, so it gets inserted before Hash
’s implementation of method_missing
, allowing us to do this. See here for more details on prepend vs. include.
if method_name[-1] == '='
send(:store, method_name[0, method_name.size - 1].send(key_store_as), args.first)
method_missing
’s first argument is the name of the method that wasn’t recognized. This line of code checks that we’re attempting to store a new attribute in the hash.
Recall that implementing a setter in Ruby can be accomplished with the following:
class Hash
def foo=(other)
self[:foo] = other
end
end
With that in mind, we just need to check that the method name ends with =
, and if so, we’ll use the store
method to store the value in the appropriate variable. key_store_as
is more configuration in intellihash
and determines whether values default to storage as strings or symbols.
else
format_method = key_retrieve_from(kwargs)
case format_method
when :any then fetch_where_present(method_name)
else send(:[], method_name.send(format_method))
end
end
This block looks complicated, but it really isn’t (again, mostly formatting, where we’re looking at whether the key is stored as a string or symbol). Take a look at the last else
, where we simply call :[]
on the hash with the appropriate key. This is exactly the same thing as doing hash[:foo]
!
That’s it! The rest of the code is simply helpers to allow for flexibility and configuration based on your needs!
Usage
intellihash = {
foo: {
bar: {
baz: {
bat: :bam
}
}
}
}
intellihash.foo.bar.baz.bat
#=> :bam
Sweet! And because the instance methods we wanted to preserve will never trigger method_missing
, that means we can do things like:
intellihash.keys
#=> [:foo]
Nice! And fortunately, there is a way around the fact that a key in the hash you’re using can collide with a hash instance method. To use an earlier example:
hash = {
keys: "OVERRIDDEN!",
foo: :bar,
baz: :bat
}
hash.keys
#=> [:keys, :foo, :bar]
hash.fetch(:keys)
#=> "OVERRIDDEN!"
Is It Fast?
You bet. Under the hood, it’s just setting an instance variable on each hash. It doesn’t need to re-instantiate itself, since all we’re doing is hooking into method_missing
whenever we can’t find an existing method. There’s some caveats to copying the hash (you can take a look at lib/callbacks
for an implementation of ensuring the correct intelligent
attribute is set), but other than that this is far simpler than instantiating an OpenStruct
.
The performance tests instantiate a large JSON payload as both OpenStruct
and intellihash
200 times each. The results speak for themselves:
user system total real
OpenStruct: 4.046875 0.906250 4.953125 ( 4.979611)
Intellihash: 0.828125 0.125000 0.953125 ( 0.956110)
Safety
I don’t advocate using this gem in production. Overriding behavior of a primitive in Ruby can be dangerous – the hash instance methods we’re talking about are written in C and subject to change, sometimes drastically, between different versions of Ruby. You’ll also notice that there’s an issue logged in the intellihash
GitHub page – Rails uses method_missing
when saving data to the database, so this gem effectively breaks Rails. There’s always a workaround, but I have a feeling this will turn into applying bandaids to lots of little issues.
Either way, it’s fun to make and understand how these methods work under the hood.
That’s all for now. Thanks for reading!