Having lots of common sense is typically a good thing. There’s a reason people say to go with your gut – it’s usually right!

However, when developing complex systems, the obvious solution can have some very hard to predict side effects if the process you’re altering is sufficiently interconnected. It’s possible some of these side effects are good, but considering they’re unexpected, it’s far more likely that they are deleterious. Even if they are good, it’s equally important to know that they’re happening and why they’re happening. Hopefully, if you find yourself in this situation, you are in a data-driven organization with appropriate instrumentation in place that can measure any of these effects and provide a path for a RCA if any outcomes are particularly nasty. If not, you’re effectively “flying blind”.


Case Study

Let’s take a look at a simple case where the solution we implemented was just common sense, but the result we got was wildly different than our expectations.

Goal

Our goal was to implement a splash screen where users can confirm their audio and video is working correctly prior to joining a video call with one of our agents. Previously, users would click a link to join a video call and just be thrown directly into the call, and we knew some calls had bad outcomes due to issues on the call. This just makes sense as a feature – we already knew that some members thrash because of bad settings, so let’s implement a screen to fix it.

Hypothesis

Because we’re data-driven and run experimentation around these types of new features, the first step was to build a measurable hypothesis. Ours was decently complicated and touched a lot of different factors, but it boils down to this:

Pre-flight video and audio checks will increase the number of successful video calls by X%.

We already had a lot of instrumentation in place around this new feature since it’s a direct 1-1 touchpoint between our employees and customers, so failure is something we want to address early. In addition, we put in an incremental rollout strategy to limit any possible effects to our customers. This way, we can start with 5% of the population using the new feature, and once we’ve verified it’s working, move to 25%, 50%, 75%, and finally a full rollout.

Result

We saw a marked decrease in the number of successful completed video calls and a marked increase in the number of “no shows” on the calls.

What we think happened is the new feature takes time. Sometimes, a LOT of time.

It seems quite obvious that the feature would take some amount of time, but what matters in this case is that we’re optimizing essentially two things: increasing successful video call completions AND decreasing employee time spent in a call with a customer (notice our hypothesis failed to include this). Since our previous flow essentially takes no time, we never really considered what effects that might have. And since we’re trying to minimize the amount of time employees spend in the call, if they get into one and have to wait 5 minutes on a customer to complete their pre-checks, they’ll just mark them a “no show” and move on.

A small portion of the no shows are probably true abandonments – i.e. getting frustrated or confused by the new flow and leaving before ever attempting to join the call proper.

Takeaways

Turns out our min/maxing exercise was at odds with each other. Luckily, as I said before, we had proper instrumentation, monitoring, and a proper incremental rollout plan in place to identify this trend and turn the feature off before it could have a larger impact on our customer base.

Moral of the story: measure, measure, measure! Run experiments, but don’t be afraid of the results. Design feature flagging into your code and have a robust rollout / rollback strategy that accounts for all contingencies.

That’s all for now. Thanks for reading!