Into the Rabbit Hole of Slow Test Suites

March 14th, 2022

We recently re-committed to TDD on our older Rails project at work. I’m on a newer team and was a little leery of our team lead pushing for TDD as I wasn’t sure how aware he was about just how brutal of a feedback loop our RSpec suite provided.

TDD is typically best applied when there’s a fast (ideally instant) feedback loop from running tests -> seeing test results, as the result of the test determines whether you write more code or move on to the next test case. The faster the loop, the faster the developer can make decisions, and the faster the development cycle for a given feature.

In our case, our RSpec suite was taking huge amounts of time to load, much less actually run any tests. We’re talking significant chunks of time – upwards of a minute – just to load.

Obviously, this makes TDD brutal in practice as even small changes require minutes to get any reasonable feedback. It leads to developer frustration at best and a reluctance to write tests in the worst case.

I took on the challenge of figuring out how to solve this longstanding issue and found a very surprising issue lurking under the hood.

Benchmarking `spec_helper`

If you’re suffering from slow RSpec suite loads, the first thing you should do is start benchmarking your spec_helper file. This is where RSpec loads all the supporting middleware it’ll need to do any testing, and most developers just chuck stuff in there willy-nilly to get their builds to pass.

Therefore, it’s a great way for a lot of cruft to build up over time and gradually bog everything down. Every time the suite loads, this file gets run, so if something is egregiously wrong here, it’ll cause major issues for even tiny tests.

Ruby Benchmarking

Ruby includes Benchmark in its standard library and is a super accessible way to start benchmarking code. Our first step to diagnosing our load issue was to throw a benchmark block into our spec_helper, one for each line (or group of lines). This includes benchmarking all the require statements as well – we need to know absolutely everything about this file to make an educated guess as to what’s going on:

Benchmark.bm do |x|
  x.report("requires") do
    require "test_helper"

    require "rspec/rails"

    require "capybara/rspec"
    require "capybara/rails"
    require "capybara-screenshot/rspec"

    require "shoulda/matchers"
    require "pundit/rspec"
    require "shoulda-callback-matchers"
    require "faker"
    require "data/matchers"
  end

  x.report("AJ uniqueness") { ActiveJob::Uniqueness.test_mode! }
  
  x.report("maintain_test_schema!") do
    # Checks for pending migration and applies them before tests are run.
    # If you are not using ActiveRecord, you can remove this line.
    ActiveRecord::Migration.maintain_test_schema!
  end

  x.report("helper load") do
    Dir[Rails.root.join("spec/helpers/**/*.rb")].each { |f| require f }
  end

  # more setup code...
end

Running this code will give you an output when the suite runs. We don’t actually care if the tests pass at this point; we’re just trying to figure out how long this file takes to work through.

An example of what this output looks like from the docs (the labels don’t correspond to the sample above):

              user     system      total        real
for:      1.010000   0.000000   1.010000 (  1.015688)
times:    1.000000   0.000000   1.000000 (  1.003611)
upto:     1.030000   0.000000   1.030000 (  1.028098)

Signs of Trouble

Once we had this data, it was pretty clear to see what the issue was.

ActiveRecord::Migration.maintain_test_schema! was taking on average ~25 seconds to complete, for every invocation of RSpec. While it doesn’t completely solve the problem, this is such a significant chunk of time it warrants further exploration.

Let’s jump over to the documentation, aaaand… it’s marked :nodoc:.

It is, however, a part of the spec_helper file that RSpec generates by default, so it’s likely this piece of code has been here since the application used RSpec.

Deep Dive into ActiveRecord

So now that we’ve reached a dead end on documentation, we’ve got to figure out how to diagnose what’s going on in the source code. Reading source code is fine on GitHub, but we already have it installed locally, and we’re absolutely sure it’s the right version, so why not use that? bundle open will open an editor with the gem’s source code:

bundle open activerecord

And from there we can begin tracing the path our application takes when calling maintain_test_schema!.

Application Background

To begin to understand what’s going on, I need to give a little backstory on the application’s context.

It’s an older Rails application running Rails 5.1.7. It also exists within an ecosystem of similar Rails applications, and it shares a data model with a couple of them. For this reason, the migrations for some of the applications lives in a shared gem which injects its migrations paths into the wrapper application on initialization. Applications that do this usually don’t even have a db/migrate directory to discourage developers from trying to put migrations there.

I don’t think this is a common pattern much anymore, but our data model gem basically is a carbon copy of this guide (perhaps it was followed back in the day). The relevant piece is this block:

module EngineWithMigrations
  class Engine < ::Rails::Engine
    isolate_namespace EngineWithMigrations

    initializer :append_migrations do |app|
      unless app.root.to_s.match root.to_s
        app.config.paths["db/migrate"] += config.paths["db/migrate"].expanded
      end
    end
  end
end

Because the gem is mounted as an engine, each gem migration path (config.paths["db/migrate"]) is merged into the application migration paths (app.config.paths["db/migrate"]). A larger discussion is probably warranted for how Rails handles paths from its configurations, as it’s not quite as simple as an array of hashes, but that exercise is left to the reader as it’s not totally relevant here.

What’s important is understanding that at the end of this process, the migration paths should look like:

db/migrate
gem/db/migrate

Tracing `maintain_test_schema!`

With all the above in mind, let’s walk through the code. You can do this in a couple ways – if you’ve bundle opened ActiveRecord you can put debugger breakpoints directly in ActiveRecord itself (please remember to remove them!), or you can add a debugger breakpoint in your spec_helper and manually run statements as needed. Either case will work here. Add your breakpoints of choice and fire up an RSpec test run.

We start with the offending call:

ActiveRecord::Migration.maintain_test_schema!

ActiveRecord::Migration.maintain_test_schema! method definition:

def maintain_test_schema! # :nodoc:
  if ActiveRecord::Base.maintain_test_schema
    suppress_messages { load_schema_if_pending! }
  end
end

Nothing out of the ordinary here, we’ll assume ActiveRecord::Base.maintain_test_schema is truthy or it wouldn’t take 25+ seconds, and suppress_messages doesn’t look particularly offensive. The next candidate is load_schema_if_pending!.

ActiveRecord::Migration.load_schema_if_pending! method definition:

def load_schema_if_pending!
  if ActiveRecord::Migrator.needs_migration? || !ActiveRecord::Migrator.any_migrations?
    # Roundtrip to Rake to allow plugins to hook into database initialization.
    FileUtils.cd Rails.root do
      current_config = Base.connection_config
      Base.clear_all_connections!
      system("bin/rails db:test:prepare")
      # Establish a new connection, the old database may be gone (db:test:prepare uses purge)
      Base.establish_connection(current_config)
    end
    check_pending!
  end
end

Alright, we have a lot going on here. First thing that stands out to me here is that we have the possibility to call out to the shell to run bin/rails db:test:prepare. This would absolutely eat up a massive amount of time as it runs essentially three expensive commands in tandem to blow away the test DB and recreate it:

export RAILS_ENV=test

bin/rails db:drop
bin/rails db:create
bin/rails db:migrate

Let’s keep digging though, as we need to know if we’re entering this block. Here, we’ll actually test the conditions:

ActiveRecord::Migrator.needs_migration? || !ActiveRecord::Migrator.any_migrations?
#=> true

ActiveRecord::Migrator.needs_migration?
#=> false

!ActiveRecord::Migrator.any_migrations?
#=> true

Aha! So we are running bin/rails db:test:prepare every time RSpec is run, even if we don’t need to (i.e. database is up-to-date already), because this ActiveRecord::Migrator thinks we have no migrations.

This isn’t the end of the story as we need to figure out why the migrator doesn’t think there are migrations.

ActiveRecord::Migrator.any_migrations? method definition:

def any_migrations?
  migrations(migrations_paths).any?
end

Simple. This is doing exactly what it says. We can infer the migrations method is looking in the migrations paths and determining what’s a valid migration or not. We can come back to that later if we have to, but it seems intuitive for now, so we’ll leave it.

The next question is what the migrator thinks the migration paths are.

ActiveRecord::Migrator.migrations_paths method definition:

def migrations_paths
  @migrations_paths ||= ["db/migrate"]
  # just to not break things if someone uses: migrations_path = some_string
  Array(@migrations_paths)
end

This looks very promising. Notice the hardcoded migration path of db/migrate. If we hit this block without @migrations_paths being defined for some reason, our paths are now wrong (recall that our migrations actually live in gem/db/migrate). We can confirm this simply in our debugger:

ActiveRecord::Migrator.migrations_paths
#=> ["db/migrate"]

Perfect. This change means this is looking in the wrong migration paths for any migrations. any_migrations? returns false, which is negated in the check for loading schema, and so the whole thing will drop the test DB every time you run RSpec.

Implementing a Fix

I honestly have no idea how the application config is failing to be picked up in this particular instance. It’s only happening in test environments, but once I noticed that this is an issue I did find an older, unresolved Rails issue that talks about it. The solution was to upgrade Rails.

To that end, we’re working towards upgrading to a modern Rails version where this is likely fixed, but in the meantime, we applied a monkey patch to fix this issue:

if Rails.env.test?
  class ActiveRecord::Migrator
    def self.migrations_paths
      @migrations_paths ||= if defined?(Rails) && Rails.application
                              Rails.application.paths["db/migrate"].to_a
                            else
                              ["db/migrate"]
                            end

      Array(@migrations_paths)
    end
  end
end

Here, we explicitly check the application configuration instead of reverting to a hardcoded migration path. Also note we’re monkey patching ActiveRecord and that makes me crazy nervous, so there’s also a check to only do this on the test environment. No need to risk a production outage if you can help it. And finally, this is placed into an initializer and run before RSpec.

The result? About a 25 second drop in load times, exactly what was expected from our benchmarking tests. Hooray!

Wrap Up

We did end up speeding the suite up a little bit more by about another 20 to 25 seconds by installing Spring. I didn’t feel that was useful to cover here as there are plenty of tutorials that probably explain that installation process in more detail. There’s also caveats to using Spring, but in our case it was a huge win for both developer experience and TDD as well.

That’s all for now. Thanks for reading!

ty-porter

Into the Rabbit Hole of Slow Test Suites
How an unexpected bug caused slow RSpec loads and hindered TDD adoption.

Benchmarking `spec_helper`

Ruby Benchmarking

Signs of Trouble

Deep Dive into ActiveRecord

Application Background

Tracing `maintain_test_schema!`

Implementing a Fix

Wrap Up

Into the Rabbit Hole of Slow Test Suites How an unexpected bug caused slow RSpec loads and hindered TDD adoption.

Benchmarking spec_helper

Ruby Benchmarking

Signs of Trouble

Deep Dive into ActiveRecord

Application Background

Tracing maintain_test_schema!

Implementing a Fix

Wrap Up

Into the Rabbit Hole of Slow Test Suites
How an unexpected bug caused slow RSpec loads and hindered TDD adoption.

Benchmarking `spec_helper`

Tracing `maintain_test_schema!`