A Reality Check on Null Hypothesis Significance Testing in A/B Testing

Published in Outperform magazine (print edition) by Eppo.
Author

Miha Gazvoda

Published

May 23, 2025

I’ve written a personal-view article, “A Reality Check on NHST in A/B Testing”, appearing in Eppo by Datadog’s Outperform magazine (print).

I highlight how NHST’s effectiveness is often explained using an overly simplistic scenario—portraying effects as either “null” or “true”. Instead, I suggest fitting a multilevel model to past experiments to estimate the distribution of unobserved true effects.

This reveals that under assumptions that likely hold in many companies, NHST with conventional error rates:

To address these, a useful alternative I see is applying a shrinkage estimator based on past experiments (and experiment-specific details), combined with explicit cost-benefit analysis. As Andrew Gelman puts it, let’s make decisions in real-world units: dollars, customers.

Performing similar analysis with your own experiments can help you explore trade-offs in your decisions beyond Type I and II errors.

Read the full article.