A/B Testing
A controlled experiment comparing two variants by randomly splitting users between them. The only reliable way to measure the causal impact of a specific change on user behavior.
Definition: A controlled experiment comparing two variants by randomly splitting users between them. The only reliable way to measure the causal impact of a specific change on user behavior.
A/B testing splits your users into two groups—one sees version A, the other sees version B—and measures which performs better on a defined metric. It is the gold standard for causal inference in product decisions.
When to Use
- Feature validation: You have a specific change and want to know if it improves a metric
- Optimization: You have a working flow and want to incrementally improve it
- Settling debates: Stakeholders disagree about which design is better—let the data decide
When Not to Use
A/B tests answer "which is better" but not "why." If your conversion rate drops 15%, an A/B test tells you the new design caused it. It does not tell you what confused users. You need qualitative research for that.
A/B tests also require sufficient traffic. If your sample size is too small, results will not reach statistical significance and you are guessing with extra steps.
Common Pitfalls
- Peeking at results early: Checking before you reach your planned sample size inflates false positives
- Testing too many variants: Each additional variant requires more traffic and increases complexity
- Ignoring effect size: A statistically significant result with a tiny effect size is not worth shipping
Related Terms
Conversion Rate
The percentage of users who complete a desired action (e.g., purchase, sign-up) out of the total number of visitors.
Statistical Significance
A determination that an observed result is unlikely to have occurred by random chance alone. Conventionally indicated by a p-value below 0.05, meaning less than 5% probability of the result being a fluke.
Sample Size
The number of participants in a research study. Appropriate sample size depends on research goals, method type (qualitative vs. quantitative), the precision required, and the number of distinct user segments being studied.