Skip to content
UPCOMING EVENTS:UX, Product & Market Research Afterwork23. Apr.@Packhaus WienDetailsInsights & Research Breakfast16. Mai@Packhaus WienDetailsVibecoding & Agentic Coding for App Development22. Mai@Packhaus WienDetails
UPCOMING EVENTS:UX, Product & Market Research Afterwork23. Apr.@Packhaus WienDetailsInsights & Research Breakfast16. Mai@Packhaus WienDetailsVibecoding & Agentic Coding for App Development22. Mai@Packhaus WienDetails

Sample Size Calculator — Tool and Explanations

An interactive sample size calculator for UX research, with the statistical foundations explained — from binomial problem discovery to power analysis.

Marc Busch
Updated March 7, 2026
10 min read

Summary

A methodology deep dive explaining the formulas, assumptions, and literature behind sample size calculation for qualitative and quantitative UX research. Covers binomial probability for usability testing, saturation research for interviews, Cochran's formula for surveys, and power analysis for A/B tests.

"How many participants do I need?" depends on what you're trying to measure and how precise you need to be. Different research goals use different statistical models, and each model comes with assumptions that shift the result.

This article walks through the formulas behind our Sample Size Calculator: binomial probability for usability testing, saturation thresholds for qualitative interviews, Cochran's formula for surveys, and power analysis for A/B tests. There are other valid approaches and thresholds out there. We document ours so you can see what's under the hood and decide what fits your context.

For a practical decision framework without the math, see Sample Sizes: Beyond the Magic Numbers.

Try It Yourself

The calculator below covers all four models. Pick your research goal, adjust the parameters, and see how the numbers change. The rest of this article explains what's happening behind each option.

Ready to plan your study? Configure your full study setup in our Study Builder.

The Qualitative Side: How Many Interviews Are Enough?

Problem Discovery (UX/Usability Tests)

UX/Usability tests ask a specific question: will users hit this problem? The underlying model is binomial probability. If a problem affects a proportion p of your users, the chance of observing it at least once in n sessions is:

P = 1 - (1 - p)^n

Flip that around to solve for n at 95% probability:

n = ⌈log(1 - 0.95) / log(1 - p)⌉

At p = 0.30 (problems that affect roughly a third of users), that gives you n = 9. At p = 0.15 (rarer problems, relevant for safety-critical systems like medical devices or automotive), you need n = 19. [1] [2]

The threshold p is the lever here. p = 0.30 means you're looking for problems that are common enough to matter in day-to-day use. p = 0.15 catches less frequent problems but doubles your sample. There's no universally correct p. It depends on what the consequences of missing a problem are.

Saturation (Strategic & Generative Research)

Interview research doesn't have a formula in the same sense. Instead it has an empirical concept: , the point where additional interviews stop producing new insights.

Hennink, Kaiser & Marconi (2017) make a useful distinction between code saturation (you've heard all the themes) and meaning saturation (you understand what they mean in depth). [3] These aren't the same point. In their study, code saturation hit at 9 interviews, meaning saturation between 16 and 24.

Guest, Bunce & Johnson (2006) found 70% of themes present after 6 interviews and thematic saturation at 12, in a homogeneous population. [4] Hagaman & Wutich (2017) confirmed similar numbers for homogeneous groups but found that heterogeneous or cross-cultural samples need 20 to 40 interviews for metatheme saturation. [5]

We use four levels in the calculator:

Leveln per segmentWhat it gets you
Quick Signals6First patterns. Enough for hypothesis generation, not for decisions.
Thematic Saturation12Stable theme landscape. Standard for most UX studies.
Deep Understanding16Full nuance and meaning. Good for strategic research, persona validation.
Comprehensive Coverage24Maximum coverage including edge themes. For fundamental product decisions.

One important constraint: if you're doing validation research (testing whether a hypothesis holds), Quick Signals isn't enough. You need at least thematic saturation to say "this pattern is stable." Exploratory research can start at 6 because the goal is generating ideas, not confirming them.

All values are per homogeneous segment. Three segments at thematic saturation means 36 interviews, not 12.

The Quantitative Side: Surveys and Beyond

Cochran's Formula for Proportions (Binary Outcomes)

When you're measuring rates (task success, conversion, yes/no questions), the question is: how many responses do I need so my margin of error stays within a useful range? Cochran's formula for proportions: [6]

n = ⌈z² × p(1 - p) / e²⌉

Where z is the z-score for your confidence level, p is the expected proportion (we use 0.5, which gives maximum variance and the most conservative estimate), and e is the margin of error you're willing to accept.

Three precision levels, three very different sample sizes:

PrecisionConfidenceMargin of Errorn
Low Stakes90% (z = 1.645)±10%68
Standard95% (z = 1.96)±5%385
High Stakes99% (z = 2.576)±3%1,844

The jump from 68 to 385 to 1,844 is worth staring at. Going from "roughly directional" to "precisely defensible" is not a linear cost increase. This is the core trade-off in quantitative sample sizing.

Cochran's Formula for Means (Continuous Scores)

When you're measuring scores like , satisfaction ratings, or task time, the formula changes because you're dealing with a different kind of variance:

n = ⌈(z × σ / E)²⌉

Where σ is the standard deviation of your measure and E is the margin of error in the same unit as the score (e.g., ±5 points on a 0-100 scale).

This is where most generic calculators get it wrong: they ask you for σ but don't tell you what a reasonable value is. σ depends heavily on what you're measuring: [7] [8]

Instrumentσ (0-100 scale)Source
SUS (System Usability Scale)12.5Sauro & Lewis, 446 studies
Multi-item questionnaire20MeasuringU benchmark
Single rating item (5pt/7pt)25MeasuringU
Single item, high variance28MeasuringU conservative

The practical effect: a SUS study at standard precision (±5 points, 95% confidence) needs only 25 participants. A single high-variance rating item at the same precision needs 97. Same formula, same confidence, four times the sample, because the underlying data is noisier.

Power Analysis for A/B Tests

A/B testing asks a different question: "Can I detect a real difference between two versions?" Instead of estimating a single value with a margin of error, you're comparing two groups and trying to avoid two kinds of mistakes: declaring a difference that isn't there (false positive, α) and missing a difference that is there (false negative, β). [10]

For continuous metrics (like SUS scores between two designs):

n per group = ⌈(z_α + z_β)² × 2 / d²⌉

Where d is Cohen's , a standardized measure of how big the difference is. The conventional benchmarks, adapted for UX:

Effect SizedMeaningn per group (standard)n total
Revolution0.8Massive, obvious change2550
Evolution0.5Clearly noticeable improvement63126
Optimization0.2Subtle, incremental gain393786

For binary metrics (like conversion rates), the formula uses the actual proportions instead of a standardized effect size:

n per group = ⌈(z_α + z_β)² × (p₁(1-p₁) + p₂(1-p₂)) / (p₁ - p₂)²⌉

The precision level matters here too. Higher confidence and higher power both increase your required sample. At standard settings (α = 0.05, power = 0.80), you can detect medium effects at reasonable sample sizes. At high stakes (α = 0.01, power = 0.90), the same effect size needs roughly 60% more participants.

The Segment Multiplier

Every formula above gives you n for one homogeneous group. If you need separate conclusions per segment (novice vs. expert, buyer vs. end-user), you multiply.

This is straightforward math with non-straightforward budget implications. A standard-precision survey with 3 segments: 385 × 3 = 1,155 participants. An A/B test detecting evolution-level effects across 2 segments: 126 × 2 = 252. The segment multiplier is where many studies either get realistic about scope or start cutting corners on precision.

The key word is "separate conclusions." If you only need one overall result and segments are just nice-to-have breakdowns, you don't need to multiply. You only multiply when each segment needs enough statistical power on its own. That distinction is worth making explicit in your research plan, because it directly determines whether you're fielding 385 participants or 1,155.

For the practical side of navigating this trade-off, see Sample Sizes: Beyond the Magic Numbers.

Diminishing Returns: When More Isn't Better

The margin of error curve for quantitative research is not linear. It flattens. Going from n = 50 to n = 100 cuts your margin of error from ±14% to ±10%. Going from n = 400 to n = 800 moves it from ±5% to ±3.5%. Double the cost, a third of the improvement.

This is why there's a sweet spot somewhere between 200 and 400 for most survey research. Below 100, your margins are wide enough that small differences drown in noise. Above 500, you're paying a lot for precision most UX decisions don't need. The calculator shows this curve visually so you can see where your specific study sits on it.

The same logic applies qualitatively, though less precisely. The difference between 6 and 12 interviews is substantial (from first patterns to thematic saturation). The difference between 20 and 30 is marginal in most studies. [11]


Formulas don't give you the "right" sample size. They make your assumptions visible. If someone asks "why 12?" you can point to Guest et al. and explain thematic saturation. If someone questions 385, you can show Cochran's formula and the precision trade-off at 95% confidence. That's more useful than "industry standard" or "best practice."

The assumptions matter more than the formulas. Whether you pick p = 0.30 or p = 0.15, whether you aim for thematic saturation or comprehensive coverage, whether you accept ±10% or insist on ±3%, those are research design decisions. The math just tells you what they cost.

Run the numbers for your own study with our Sample Size Calculator, or see Sample Sizes: Beyond the Magic Numbers for the strategic perspective.

References

  1. [1]
    Jakob Nielsen & Thomas K. Landauer. (1993). "A Mathematical Model of the Finding of Usability Problems". Proceedings of ACM INTERCHI'93.LinkDOI
  2. [2]
    Robert A. Virzi. (1992). "Refining the Test Phase of Usability Evaluation: How Many Subjects Is Enough?". Human Factors.LinkDOI
  3. [3]
    Monique M. Hennink et al.. (2017). "Code Saturation Versus Meaning Saturation: How Many Interviews Are Enough?". Qualitative Health Research.LinkDOI
  4. [4]
    Greg Guest et al.. (2006). "How Many Interviews Are Enough? An Experiment with Data Saturation and Variability". Field Methods.LinkDOI
  5. [5]
    Ashley K. Hagaman & Amber Wutich. (2017). "How Many Interviews Are Enough to Identify Metathemes in Multisited and Cross-cultural Research?". Field Methods.LinkDOI
  6. [6]
    William G. Cochran. (1977). "Sampling Techniques". John Wiley & Sons.
  7. [7]
    Jeff Sauro & James R. Lewis. (2016). "Quantifying the User Experience: Practical Statistics for User Research". Morgan Kaufmann.Link
  8. [8]
    Jeff Sauro. (2025). "Sample Sizes for Comparing Rating Scale Means". MeasuringU.Link
  9. [9]
    Emre Akin Kayi & et al.. (2022). "System Usability Scale Benchmarking for Digital Health Apps: Meta-analysis". JMIR mHealth and uHealth.LinkDOI
  10. [10]
    Jacob Cohen. (1988). "Statistical Power Analysis for the Behavioral Sciences". Lawrence Erlbaum Associates.
  11. [11]
    Greg Guest et al.. (2020). "A Simple Method to Assess and Report Thematic Saturation in Qualitative Research". PLoS ONE.LinkDOI
  12. [12]
    Gail M. Sullivan & Anthony R. Artino. (2013). "Analyzing and Interpreting Data From Likert-Type Scales". Journal of Graduate Medical Education.LinkDOI

READY TO TAKE ACTION?

Let's discuss how these insights can drive your business forward.

Sample Size Calculator — Tool and Explanations | Busch Labs | Busch Labs