Summary
Standardized UX instruments like SUS, NPS, SEQ, PMF Score, and UEQ provide validated, comparable measurements. Each targets different constructs: SUS measures perceived usability, NPS measures loyalty/recommendation intent, SEQ captures task-level difficulty, and the PMF Score measures perceived indispensability. Understanding what each instrument measures, and its limitations, prevents misuse and misinterpretation. Always pair metrics with qualitative data to understand the 'why' behind scores.
When you need to measure user experience or usability quantitatively, do not invent your own questions. Use standardized instruments that have been validated through research.
These tools provide:
- Reliability: Consistent results across administrations
- Validity: Actually measuring what they claim to measure
- Benchmarks: Data from thousands of studies for comparison
- Comparability: Ability to compare your results to industry standards
But using them effectively requires understanding what each one measures, and what it does not.
Which Tool for Which Layer?
Before diving into individual instruments, understand that measurement happens at different layers of the user experience. Choose your tool based on what layer you are measuring.
| Layer | Acronym | What It Measures | Best For |
|---|---|---|---|
| Micro (Task) | SEQ | Task-level difficulty | Immediate post-task feedback |
| Meso (Product) | SUS / UEQ | Overall product usability | Benchmarking the full experience |
| Macro (Relationship) | NPS | Loyalty and recommendation intent | Tracking customer sentiment over time |
Micro-Level: The Task
Use the SEQ (Single Ease Question) immediately after a task. It captures the friction of a specific interaction while the experience is still fresh.
Meso-Level: The Product
Use the SUS (System Usability Scale) or UEQ to benchmark the overall usability of the product or application. Administer these after the participant has completed all core tasks.
Macro-Level: The Relationship
Use NPS (Net Promoter Score) to track the overall customer relationship and loyalty over time.
System Usability Scale (SUS)
The SUS [1] is the most widely used standardized usability questionnaire. It consists of 10 statements rated on a 5-point scale from "Strongly Disagree" to "Strongly Agree."
What It Measures
SUS measures perceived usability, participants' subjective impression of how usable a system is. The final score ranges from 0 to 100.
Interpreting SUS Scores
Based on extensive research [3], SUS scores can be interpreted as:
| Score | Adjective Rating | Percentile Rank |
|---|---|---|
| 84.1+ | Best Imaginable | Top 10% |
| 80.3 | Excellent | Top 20% |
| 68 | OK (Average) | ~50th percentile |
| 51 | Poor | Bottom 20% |
| Below 50 | Awful | Bottom 10% |
The industry average is approximately 68 [2].
Practical Considerations
When to administer: After the participant has completed core tasks, not before they have had meaningful interaction.
Minimum sample: For stable scores, aim for at least 12-14 participants. With fewer than 8, confidence intervals become very wide.
Do not modify the questions: The scale has been validated as a complete instrument. Changing wording or removing items invalidates the norms.
Net Promoter Score (NPS)
NPS [4] measures customer loyalty through a single question: "How likely are you to recommend [product/company] to a friend or colleague?" rated 0-10.
How It Works
Respondents are classified as:
- Promoters (9-10): Loyal enthusiasts who will keep buying and refer others
- Passives (7-8): Satisfied but unenthusiastic customers
- Detractors (0-6): Unhappy customers who can damage your brand
NPS = % Promoters - % Detractors
The score ranges from -100 (everyone is a detractor) to +100 (everyone is a promoter).
What It Actually Measures
NPS measures recommendation intent, which is often used as a proxy for loyalty and growth potential. However, NPS is controversial among researchers:
Criticisms:
- A single question cannot capture the complexity of customer loyalty
- The classification (0-6 as "detractors") is somewhat arbitrary
- NPS does not explain why someone would or would not recommend
- Cultural differences affect how people use the scale
When to Use NPS
NPS is appropriate for:
- Tracking overall brand or product sentiment over time
- Segmenting customers by loyalty
- Creating a simple KPI for executive dashboards
It is not appropriate for:
- Evaluating specific features or interface changes
- Replacing usability testing
- Making detailed design decisions
Product-Market Fit Score (The Sean Ellis Test)
While NPS measures whether customers would recommend your product, the Product-Market Fit (PMF) Score measures something different: how much they would miss it if it disappeared.
The question is simple: "How would you feel if you could no longer use [product]?"
Response options:
- Very disappointed
- Somewhat disappointed
- Not disappointed
- N/A (I no longer use it)
The 40% Benchmark
The key metric is the percentage of respondents who select "Very disappointed." Sean Ellis, who coined the term "growth hacking," proposed a simple heuristic: if more than 40% of your users say they would be "very disappointed" without your product, you have achieved product-market fit.
This is a startup heuristic, not a scientifically validated threshold. The 40% figure emerged from pattern-matching across successful startups rather than controlled research. Treat it as a useful rule of thumb, not a hard cutoff.
What It Actually Measures
The PMF Score captures perceived indispensability. It asks users to imagine life without the product and report the emotional weight of that loss.
This is distinct from both usability and loyalty:
- A product can be highly usable (strong SUS score) without being indispensable
- A product can generate strong recommendation intent (high NPS) for reasons unrelated to personal reliance
- The PMF Score specifically targets whether the product has become woven into the user's workflow or life
When to Use It
The PMF Score is most valuable for:
- Early-stage products seeking validation before scaling
- Products searching for their core audience segment
- Tracking whether new features increase or decrease perceived value
It is less useful for mature products with established market positions, where the question becomes less about "do we have fit" and more about "how do we expand and retain."
Single Ease Question (SEQ)
The SEQ asks: "Overall, how easy or difficult was this task?" rated on a 7-point scale from "Very Difficult" to "Very Easy."
When to Use It
Administer SEQ immediately after each task in a UX test. It captures in-the-moment perceived difficulty before memory fades.
Interpretation
Based on benchmark data [6]:
- Average SEQ: ~5.5
- Scores above 5.5 indicate above-average ease
- Scores below 4.5 suggest significant difficulty
SEQ correlates with task success and time on task, making it a useful quick check even when objective metrics are available.
User Experience Questionnaire (UEQ)
The UEQ [5] is a more comprehensive instrument measuring six dimensions of user experience:
- Attractiveness: Overall impression of the product
- Perspicuity: How easy it is to learn
- Efficiency: How quickly users can accomplish goals
- Dependability: How in control users feel
- Stimulation: How exciting or motivating to use
- Novelty: How innovative or creative the design seems
When to Use It
UEQ is appropriate when you need a more nuanced view of user experience beyond usability alone. It captures both pragmatic quality (efficiency, perspicuity, dependability) and hedonic quality (stimulation, novelty).
Practical Notes
- Takes about 3-5 minutes to complete
- Requires at least 20 participants for reliable results
- Free to use, with online benchmarks available
Choosing the Right Instrument
| Instrument | Measures | Best For | Minimum Sample |
|---|---|---|---|
| SUS | Perceived usability | Post-study overall assessment | 12-14 |
| SEQ | Task difficulty | After each task | 5-10 per task |
| NPS | Recommendation intent | Customer loyalty tracking | 30+ |
| PMF Score | Perceived indispensability | Product-market fit validation | 30+ |
| UEQ | 6 UX dimensions | Comprehensive UX assessment | 20+ |
What Is a "Good" Score?
Raw scores are meaningless without context. Here are the benchmarks you need to interpret your results.
SUS Benchmarks
| Score Range | Interpretation |
|---|---|
| 85+ | Excellent — Top 10% of products |
| 71-84 | Good — Above average usability |
| 68 | Average — The industry midpoint |
| 51-67 | Below Average — Needs improvement |
| Below 50 | Failure — Serious usability problems |
The magic number to remember: 68 is average. Anything above 71 is genuinely good. Below 50 indicates fundamental problems that will frustrate most users.
NPS Benchmarks
NPS is highly industry-dependent. A score of +30 might be excellent in one industry and mediocre in another.
SEQ Benchmarks
On a 7-point scale:
- Average: ~5.5
- Above 5.5: Task was perceived as easier than average
- Below 4.5: Task was perceived as difficult—investigate further
However, SEQ depends heavily on inherent task difficulty. A complex task (e.g., "Configure your tax settings") will naturally score lower than a simple one (e.g., "Find the search bar"). Compare SEQ scores across versions of the same task, not across different tasks.
Common Mistakes
Modifying Validated Instruments
"We only need 5 of the SUS questions." No. Validated instruments work as complete packages. Removing or rewording items invalidates the benchmarks.
Over-relying on Single Metrics
NPS alone does not tell you what to fix. SUS alone does not explain why users struggle. Quantitative metrics tell you that something is happening; qualitative data tells you why.
Ignoring Context
A SUS score of 72 might be excellent for complex enterprise software and mediocre for a consumer mobile app. Always consider the product category when interpreting scores.
Treating Scores as Precise
All measurements have uncertainty. A SUS score of 72 might really be "somewhere between 65 and 79." Report confidence intervals, especially with smaller samples.
What This Means for Practice
Standardized instruments are powerful tools when used correctly:
- Choose the right instrument for what you actually need to measure
- Administer them correctly, at the right time, with complete items, to enough participants
- Interpret with benchmarks, raw scores are meaningless without context
- Combine with qualitative data, metrics quantify problems; observation reveals them
- Report uncertainty, acknowledge the precision limits of your sample size
The goal is not to chase a score. It is to use measurement as one input into better design decisions.
References
- [1]John Brooke. (1996). "SUS: A 'Quick and Dirty' Usability Scale". Usability Evaluation in Industry.Link
- [2]John Brooke. (2013). "SUS: A Retrospective". Journal of Usability Studies.Link
- [3]Aaron Bangor et al.. (2009). "Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale". Journal of Usability Studies.Link
- [4]Frederick F. Reichheld. (2003). "The One Number You Need to Grow". Harvard Business Review.Link
- [5]
- [6]Jeff Sauro & James R. Lewis. (2016). "Quantifying the User Experience: Practical Statistics for User Research". Morgan Kaufmann.Link