Making Hash Smarter

One of the (few) nice things about Javascript over Ruby is its handling of the Object data type (and by extension, JSON), where you can define a hashmap and access its values via standard dot-notation:

var foo = {};
foo["bar"] = "baz";

foo.bar;
// "baz"

If you tried to do this in Ruby, you’d have big problems:

foo = {}
foo["bar"] = "baz"

foo.bar
# NoMethodError

foo["bar"]
#=> "baz"

Notice instead that you have to use square brackets to access the same attribute. This is fine, except for the fact that the first method is much more convenient when you have some complicated hash you’re working with. For instance, if you’re dealing with a complicated JSON payload, it’s a lot easier to use dot-notation than Ruby hash syntax.


OpenStruct

Fortunately, Ruby does offer an alternative solution in OpenStruct, which stuffs all the data into attributes on the class and provides a similar syntax. You can even parse JSON directly into an OpenStruct (I commonly do this if I’m testing JSON responses from fixtures). There are two downsides to this, though:

  1. It’s relatively slow. We have to instantiate not only the original hash, but also the OpenStruct as well.
  2. OpenStruct overrides existing Hash instance methods, and instead returns nil unless the key we’re looking up by is explicitly present.

The first is an obvious issue, but it can be mitigated if you only use it sparingly.

The second is larger. Imagine we have a hash like this, and turn it into an OpenStruct:

hash = {
  keys: "OVERRIDDEN!",
  foo: :bar,
  baz: :bat
}

ostruct = OpenStruct.new(hash)

ostruct.keys
#=> "OVERRIDDEN!"

ostruct.size
#=> nil

In the first example, we’ve overridden a Hash instance method that should return all the keys in the hash (:keys, :foo, :bar). And in the second example, we haven’t overridden anything, but we get an unexpected result (Hash#size should return the number of keys in the hash, in this case 3).


Enter Metaprogramming

In order to make our hashes smarter, we need a couple things, then:

  • Hash keys as methods
  • Fast instantiation
  • Instance methods still available

We’re going to be looking at source code from intellihash, the gem I created to make better hashes.

In order to do this, we’ll be overriding method_missing for the Hash class. If you haven’t played with method_missing, it is evoked every time a message is sent to a Ruby object where no corresponding method can be found. This method determines what the object does when the method can’t be found. In most cases, this is to raise a NoMethodError, as you saw when we tried to access a Ruby hash’s attributes via dot-notation. This is exactly why we’re going to override it to give us the new functionality we need!


method_missing

Here’s the guts of intellihash:

# lib/mixins.rb

def method_missing(method_name, *args, **kwargs, &block)
  super unless respond_to?(:is_intelligent?) && is_intelligent?

  if method_name[-1] == '='
    send(:store, method_name[0, method_name.size - 1].send(key_store_as), args.first)
  else
    format_method = key_retrieve_from(kwargs)
    case format_method
    when :any then fetch_where_present(method_name)
    else send(:[], method_name.send(format_method))
    end
  end
end

There’s a lot going on. Most of this is related to configuration. Let’s go line-by-line.

  super unless respond_to?(:is_intelligent?) && is_intelligent?

We can selectively turn off the intelligent attribute on our hash instance to stop using intellihash features, and this ensures we call the original method from Hash if we aren’t dealing with an intelligent hash. Side note: This module is prepended instead of included, so it gets inserted before Hash’s implementation of method_missing, allowing us to do this. See here for more details on prepend vs. include.

  if method_name[-1] == '='
    send(:store, method_name[0, method_name.size - 1].send(key_store_as), args.first)

method_missing’s first argument is the name of the method that wasn’t recognized. This line of code checks that we’re attempting to store a new attribute in the hash.

Recall that implementing a setter in Ruby can be accomplished with the following:

class Hash
  def foo=(other)
    self[:foo] = other
  end
end

With that in mind, we just need to check that the method name ends with =, and if so, we’ll use the store method to store the value in the appropriate variable. key_store_as is more configuration in intellihash and determines whether values default to storage as strings or symbols.

  else
    format_method = key_retrieve_from(kwargs)
    case format_method
    when :any then fetch_where_present(method_name)
    else send(:[], method_name.send(format_method))
    end
  end

This block looks complicated, but it really isn’t (again, mostly formatting, where we’re looking at whether the key is stored as a string or symbol). Take a look at the last else, where we simply call :[] on the hash with the appropriate key. This is exactly the same thing as doing hash[:foo]!

That’s it! The rest of the code is simply helpers to allow for flexibility and configuration based on your needs!


Usage

intellihash = {
    foo: {
        bar: {
            baz: {
                bat: :bam
            }
        }
    }
}

intellihash.foo.bar.baz.bat
#=> :bam

Sweet! And because the instance methods we wanted to preserve will never trigger method_missing, that means we can do things like:

intellihash.keys
#=> [:foo]

Nice! And fortunately, there is a way around the fact that a key in the hash you’re using can collide with a hash instance method. To use an earlier example:

hash = {
  keys: "OVERRIDDEN!",
  foo: :bar,
  baz: :bat
}

hash.keys
#=> [:keys, :foo, :bar]

hash.fetch(:keys)
#=> "OVERRIDDEN!"

Is It Fast?

You bet. Under the hood, it’s just setting an instance variable on each hash. It doesn’t need to re-instantiate itself, since all we’re doing is hooking into method_missing whenever we can’t find an existing method. There’s some caveats to copying the hash (you can take a look at lib/callbacks for an implementation of ensuring the correct intelligent attribute is set), but other than that this is far simpler than instantiating an OpenStruct.

The performance tests instantiate a large JSON payload as both OpenStruct and intellihash 200 times each. The results speak for themselves:

                  user     system      total        real                
OpenStruct:   4.046875   0.906250   4.953125 (  4.979611)
Intellihash:  0.828125   0.125000   0.953125 (  0.956110)

Safety

I don’t advocate using this gem in production. Overriding behavior of a primitive in Ruby can be dangerous – the hash instance methods we’re talking about are written in C and subject to change, sometimes drastically, between different versions of Ruby. You’ll also notice that there’s an issue logged in the intellihash GitHub page – Rails uses method_missing when saving data to the database, so this gem effectively breaks Rails. There’s always a workaround, but I have a feeling this will turn into applying bandaids to lots of little issues.

Either way, it’s fun to make and understand how these methods work under the hood.

That’s all for now. Thanks for reading!