Now, let us dive into three key limitations that organizations should consider before running an online A/B test to avoid costly bias. By understanding and mitigating these limitations, businesses can maximize the value of A/B testing, make more informed decisions, and drive meaningful improvements in their digital experiences.
1. Channel: Uncovering the User’s Perspective
One of the primary limitations of online A/B testing is understanding the reasons behind user preferences for one option over another. Often, the choice between options A and B is not explicitly justified, leaving experimenters to speculate about user behavior. In scientific research, we call this the “channel,” the reasoning explaining the rationale for the causal effect.
Imagine that your option B incorporates an additional feature on the checkout page (e.g. recommendations for similar products or products bought together). You observe a drop in purchases with option B and hence conclude that it was a bad idea. However, a more careful analysis revealed that actually, the time to load the page for option B was longer. Now you have basically two differences: the content and the waiting time. Hence, back to the concept of causality, you don’t know what drives the choice; the two are confounded. If you think that loading time is marginal, think again: “ […] experiments at Amazon showed a 1% sales decrease for an additional 100msec, and that a specific experiments at Google, which increased the time to display search results by 500 msecs reduced revenues by 20%” (Kohavi et al. (2007))
Solutions: First, to mitigate this limitation, incorporating additional survey questions can provide valuable insights into users’ motivations and hence minimize the risk of biased interpretations. Second, trying to avoid having several differences helps to pin down the cause (e.g. having the same loading time).
2. Short-Term vs. Long-Term Impact: Beyond Immediate Results
When conducting an online A/B test, it is essential to consider the potential long-term effects of the chosen metric. While short-term objectives, such as click-through rates or immediate conversions, may seem favorable initially, they could have adverse consequences in the long run. For example, employing clickbait strategies may yield quick views and impressions, but they might negatively impact the audience’s perception and your credibility over time.
Solution: It is crucial to measure multiple metrics that assess both short-term and long-term impact. By evaluating a comprehensive range of indicators, organizations can make more informed decisions and avoid myopic optimization strategies. Long-term impact metrics could include satisfaction evaluation and audience retention (e.g. time of a video watched or time spent reading an article). That being said, it is not trivial to assess those.
3. Primacy and Newness Effects: The Influence of Novelty
Two related limitations arise from the influence of novelty in online A/B testing: primacy and newness effects. Primacy effect refers to the fact that experienced users might be confused or lost when encountering a change, such as a button’s placement or color alteration. Conversely, newness effect occurs when users are tempted to interact with a new feature due to its novelty, but this effect may fade quickly. These effects are particularly prevalent in platforms where users have regular interactions, such as social media.
Solution: It is recommended to run experiments over several weeks, observing how the effects change over time. By monitoring the fluctuating user behavior, experimenters can gain a more comprehensive understanding of the long-term impact of their changes.
Conclusion:
While online A/B testing offers a valuable tool for data-driven decision-making, it is crucial to consider at least those three potential issues. By considering the channel through which users engage, measuring both short-term and long-term impacts, and accounting for primacy and newness effects, organizations can enhance the reliability and validity of their A/B testing results. This is just the tip of the iceberg and I invite you to read further: Kohavi, R., Henne, R. M., & Sommerfield, D. (2007, August). Practical guide to controlled experiments on the web: listen to your customers not to the hippo. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 959–967).