Piloting No Manual QA

12 May 2023

I’m on the internal tooling team with ID.me and we’ve been piloting a no-manual-QA approach for quite a while now, to good success. It’s probably better termed low-manual-QA as individual team members can opt to test their code in a staging environment if they want to, but we don’t require it. It hasn’t stopped all incidents, but it has made them less frequent and less impactful.

Our no QA approach has been particularly great for our team as our QA resources and access to limited staging environments can cause issues with bottlenecking our release cadence.

Step 1: Automated Tests Are Mandatory

This step should be a surprise to precisely no one. No manual QA doesn’t mean that no tests are run. In terms of raw numbers it’s probably the opposite – more tests are run in our paradigm than a high-manual-QA team.

Taking this step forces developers to think about what needs to be tested and how it should be tested, which is good. It encourages (but does not require) a test driven development (TDD) approach, which is also good because there are strong opinions on both sides of that argument.

Tooling is important for this step. By making tests mandatory I mean that your CI pipeline should fail until a developer writes tests. To that end, our tooling looks something like this:

RSpec as the test framework
simplecov as the code coverage tool
- We fail builds that cause code coverage % to drop by more than 1%. This gives a little breathing room but ensures that new tests are covered.
undercover, an extension of simplecov
- This gem compares the diff to your master branch and fails out if the coverage for new code is not present.
- This has downsides – if you run a linter over code that has no tests, it’ll get captured in the diff and flagged as uncovered. Technically correct, but care needs to be taken so you don’t spend too much time writing tests for code you only removed whitespace from (for instance).
- It’s been great to catch missing cases for branches and has uncovered a few bugs that would have been insidious.

Automation is the critical step here to reduce manual QA effort, so a lot of thought should go into how it’s used.

Step 2: Solid Code Reviews

Obviously, automation never catches everything. That means code reviews need to be thorough and not a rubber stamp.

We encourage solid code review from at least 2 engineers and will stop code from going out if there are issues with it. Work across applications requires cross-team code reviews. And finally, large changes require a mob review where the engineer walks through the change to make sure everyone is on the same page.

Step 3: Feature Flag Everything

In case the first 2 steps fail to contain a bug, this step limits the damage by enabling you to turn off the feature flag and stop the offending code from running.

Feature flagging also hooks directly into testing – if you’re writing tests for the feature, you also need good tests that turning off the flag will work. It’s a bit of extra work but no more than a QA engineer would take to manually test your PR.

Bonus points if you’re using a trunk-based version control flow: you can stack PRs on top of them and release them as they’re ready rather than needing a huge feature branch that has the potential to sit for weeks until the entire feature is ready to release. As long as the feature is behind a flag, it can go in as soon as it’s reviewed, not after it’s tested manually.

It’s not really important where the flag lives. Our tool of choice is Flipper with the flags living in Redis, but a database flag or even a special role or permission that isn’t assigned to anyone can work.

Step 4: When All Else Fails, QA It (Yourself)

There’s nothing that prevents you from doing manual testing if needed. Sometimes, due to environment differences, a no QA approach simply doesn’t work. For instance, I needed to update a table with 120M rows. That’s not a small change, and not something that is easily load tested on a local environment. Luckily, our staging environment has enough data to simulate the size I needed, so I tested it myself to make sure it was ready to go.

As long as you’ve followed steps 1 - 3, testing yourself is acceptable in my opinion. Otherwise, I do think there is value in a second (unbiased) set of eyes looking at a change to keep developers honest.

All in all it has been a great success with our team. It’s a work in progress as all things are and there are more steps we’d like to take to automate even more of our testing process.

I’m personally quite interested in the topic of how to launch reliable code, so if you’re of the same opinion, be sure to check out my Launching Dark article I wrote last year where I talked about extensive feature flagging and other strategies for launching a complex authentication uplift with zero downtime. It was developed under a no manual QA paradigm using all the techniques I talked about above and was a resounding success!

That’s all for now. Thanks for reading!

development (61)