Summary
Standardized UX instruments like SUS, NPS, SEQ, PMF Score and UEQ, plus lighter alternatives (UMUX, UMUX-LITE), the UEQ family (UEQ-S, UEQ+), and the web-specific SUPR-Q, provide validated and comparable measurements. Each targets a different construct, from perceived usability to recommendation intent to task difficulty to perceived indispensability. The article covers what each measures, when to use it, and how to interpret scores. Pair metrics with qualitative data to understand the 'why' behind the numbers.
When you need to measure user experience or usability quantitatively, do not invent your own questions. Use standardized instruments that have been validated through research.
These tools provide:
- Reliability: Consistent results across administrations
- Validity: Actually measuring what they claim to measure (see Types of Validity)
- Benchmarks: Data from thousands of studies for comparison
- Comparability: Ability to compare your results to industry standards
But using them effectively requires understanding what each one measures, and what it does not.
The interactive scale catalog above lets you filter all 195+ catalogued instruments by construct, domain, benchmark availability, and publication year. This article focuses on the most-used core instruments and how to choose between them.
Which Tool for Which Layer?
Before diving into individual instruments, understand that measurement happens at different layers of the user experience. Choose your tool based on what layer you are measuring.
| Layer | Acronym | What It Measures | Best For |
|---|---|---|---|
| Micro (Task) | SEQ | Task-level difficulty | Immediate post-task feedback |
| Meso (Product) | SUS / UEQ | Overall product usability | Benchmarking the full experience |
| Macro (Relationship) | NPS | Loyalty and recommendation intent | Tracking customer sentiment over time |
Micro-Level: The Task
Use the SEQ (Single Ease Question) immediately after a task. It captures the friction of a specific interaction while the experience is still fresh.
Meso-Level: The Product
Use the SUS (System Usability Scale) or UEQ to benchmark the overall usability of the product or application. Administer these after the participant has completed all core tasks.
Macro-Level: The Relationship
Use NPS (Net Promoter Score) to track the overall customer relationship and loyalty over time.
For the experience components that each instrument layer maps to, see Components of Experience.
System Usability Scale (SUS)
The SUS [1] is the most widely used standardized usability questionnaire. It consists of 10 statements rated on a 5-point scale from "Strongly Disagree" to "Strongly Agree."
What It Measures
SUS measures perceived usability, participants' subjective impression of how usable a system is. The final score ranges from 0 to 100.
Interpreting SUS Scores
Two complementary benchmarks are in routine use: the Adjective Rating Scale (Bangor, Kortum & Miller 2009) for quick stakeholder communication, and the Curved Grading Scale (Sauro & Lewis 2016) for finer-grained letter grades anchored to empirical percentiles.
Adjective Rating Scale (Bangor 2009)
Based on extensive research [2], SUS scores can be communicated using everyday adjectives:
| Score | Adjective Rating | Percentile Rank |
|---|---|---|
| 84.1+ | Best Imaginable | Top 10% |
| 80.3 | Excellent | Top 20% |
| 68 | OK (Average) | ~50th percentile |
| 51 | Poor | Bottom 20% |
| Below 50 | Awful | Bottom 10% |
Use this scheme when you need a quick label for a stakeholder readout ("Excellent", "Awful"). Use the Curved Grading Scale below when you want a finer-grained letter grade tied to empirical percentiles.
Curved Grading Scale (Sauro & Lewis 2016)
The curved grading scale [3] maps SUS scores to letter grades. It is anchored to Sauro's (2011) reference dataset of 5,000+ responses across roughly 500 studies (M = 68, SD = 12.5).
| Grade | Score Range | Percentile |
|---|---|---|
| A+ | 84.1+ | Top 5% |
| A | 80.8 – 84.0 | Top 10% |
| A- | 78.9 – 80.7 | Top 15% |
| B+ | 77.2 – 78.8 | Top 20% |
| B | 74.1 – 77.1 | Top 30% |
| B- | 72.6 – 74.0 | Top 35% |
| C+ | 71.1 – 72.5 | Top 40% |
| C | 65.0 – 71.0 | ~Average (M=68, SD=12.5) |
| C- | 62.7 – 64.9 | Bottom 35% |
| D | 51.7 – 62.6 | Bottom 15% |
| F | Below 51.7 | Bottom 10% |
The industry average is approximately 68 [4].
Practical Considerations
When to administer: After the participant has completed core tasks, not before they have had meaningful interaction.
Minimum sample: For stable scores, aim for at least 12-14 participants. With fewer than 8, confidence intervals become very wide.
Do not modify the questions: The scale has been validated as a complete instrument. Changing wording or removing items invalidates the norms.
Lighter SUS Alternatives
When 10 items feel like too much survey real estate, two derivatives target the same construct (perceived usability) at lower cost.
UMUX [5] is a 4-item, 7-point Likert measure framed around the ISO 9241-11 components (effectiveness, efficiency, satisfaction). It correlates strongly with SUS in head-to-head studies. The trade-off is its mixed positive/negative item tone, which has been challenged on dimensionality grounds. Pick UMUX when you want SUS-comparable usability with less than half the items, but be aware that the structural debate is unresolved.
UMUX-LITE [6] shrinks the instrument further to 2 positive-tone items: "This system's capabilities meet my requirements" and "This system is easy to use." It is the right pick when survey real estate is tightest (post-task pop-ups, in-app intercepts). Raw UMUX-LITE scores run elevated by roughly 5–10 points compared with SUS, so apply the published regression formula before benchmarking against SUS norms.
For full psychometrics, factor structure debates, and benchmark availability across these alternatives, filter the catalog above by construct = "Perceived Usability".
Net Promoter Score (NPS)
NPS [7] measures customer loyalty through a single question: "How likely are you to recommend [product/company] to a friend or colleague?" rated 0-10.
How It Works
Respondents are classified as:
- Promoters (9-10): Loyal enthusiasts who will keep buying and refer others
- Passives (7-8): Satisfied but unenthusiastic customers
- Detractors (0-6): Unhappy customers who can damage your brand
NPS = % Promoters - % Detractors
The score ranges from -100 (everyone is a detractor) to +100 (everyone is a promoter).
What It Actually Measures
NPS measures recommendation intent, which is often used as a proxy for loyalty and growth potential. However, NPS is controversial among researchers:
Criticisms:
- A single question cannot capture the complexity of customer loyalty
- The classification (0-6 as "detractors") is somewhat arbitrary
- NPS does not explain why someone would or would not recommend
- Cultural differences affect how people use the scale
When to Use NPS
NPS is appropriate for:
- Tracking overall brand or product sentiment over time
- Segmenting customers by loyalty
- Creating a simple KPI for executive dashboards
It is not appropriate for:
- Evaluating specific features or interface changes
- Replacing usability testing
- Making detailed design decisions
Product-Market Fit Score (The Sean Ellis Test)
While NPS measures whether customers would recommend your product, the Product-Market Fit (PMF) Score measures something different: how much they would miss it if it disappeared. The instrument was popularized in Hacking Growth [9].
The question is simple: "How would you feel if you could no longer use [product]?"
Response options:
- Very disappointed
- Somewhat disappointed
- Not disappointed (it really isn't that useful)
- N/A: I no longer use it
The 40% Benchmark
The key metric is the percentage of respondents who select "Very disappointed." Sean Ellis, who coined the term "growth hacking," proposed a simple heuristic [8]: if more than 40% of your users say they would be "very disappointed" without your product, you have achieved product-market fit.
Not a psychometrically validated instrument. The 40% figure is a practitioner heuristic, derived from pattern-matching across early-stage startups; Sean Ellis himself has called it "a bit arbitrary". Treat it as a useful rule of thumb, not a hard cutoff.
Who Counts as a Respondent
Sample composition matters more than headline percentage. Ellis prescribes strict user qualification [10]: include only respondents who
- have experienced the core product (not landing-page visitors or trial drop-offs),
- have used it at least twice, and
- have used it within the last 14 days.
Recommended sample size is roughly 30–40 qualified active users. Below that, the percentage swings too much per respondent to be informative.
What It Actually Measures
The PMF Score captures perceived indispensability. It asks users to imagine life without the product and report the emotional weight of that loss.
This is distinct from both usability and loyalty:
- A product can be highly usable (strong SUS score) without being indispensable
- A product can generate strong recommendation intent (high NPS) for reasons unrelated to personal reliance
- The PMF Score specifically targets whether the product has become woven into the user's workflow or life
When to Use It
The PMF Score is most valuable for:
- Early-stage products seeking validation before scaling
- Products searching for their core audience segment
- Tracking whether new features increase or decrease perceived value
It is less useful for mature products with established market positions, where the question becomes less about "do we have fit" and more about "how do we expand and retain."
Single Ease Question (SEQ)
The SEQ [11] asks: "Overall, how easy or difficult was this task?" rated on a 7-point scale from "Very Difficult" to "Very Easy."
When to Use It
Administer SEQ immediately after each task in a UX test. It captures in-the-moment perceived difficulty before memory fades.
Interpretation
Based on benchmark data [12]:
- Average SEQ: ~5.5
- Scores above 5.5 indicate above-average ease
- Scores below 4.5 suggest significant difficulty
SEQ correlates with task success and time on task, making it a useful quick check even when objective metrics are available.
User Experience Questionnaire (UEQ)
The UEQ [13] is a more comprehensive instrument measuring six dimensions of user experience:
- Attractiveness: Overall impression of the product
- Perspicuity: How easy it is to learn
- Efficiency: How quickly users can accomplish goals
- Dependability: How in control users feel
- Stimulation: How exciting or motivating to use
- Novelty: How innovative or creative the design seems
When to Use It
UEQ is appropriate when you need a more nuanced view of user experience beyond usability alone. It captures both pragmatic quality (efficiency, perspicuity, dependability) and hedonic quality (stimulation, novelty).
Practical Notes
- 26 items, 7-point semantic differential, 3–5 minutes to complete
- Requires at least 20 participants for reliable results
- Free to use, with online benchmarks (Schrepp, Hinderks & Thomaschewski 2017; N = 21,175 responses across 452 product evaluations)
UEQ Family
The UEQ has two siblings worth knowing about, both authored by the original UEQ team.
UEQ-S [14] is the 8-item short version. The 26 UEQ items collapse to two meta-factors, Pragmatic Quality (efficiency, perspicuity, dependability) and Hedonic Quality (stimulation, novelty), with 4 items each. It reproduces the full UEQ overall UX score with a mean deviation of 0.06 and shares the UEQ benchmark dataset. Pick UEQ-S when 26 items are too long for your survey but you still want to separate pragmatic from hedonic quality.
UEQ+ [15] is a modular framework rather than a fixed questionnaire. It offers 16 selectable scales (the original 6 plus Trust, Intuitive Use, Adaptability, Usefulness, Visual Aesthetics, Value, Stickiness, Content Quality, Trustworthiness of Content, and Immersion), 4 bipolar item pairs per scale, plus an importance rating per scale that lets you weight scales into a tailored UX KPI. Pick UEQ+ when you need a product-specific UX KPI (a content app cares about Content Quality and Stickiness; a fintech tool cares about Trust and Usefulness). Note that UEQ+ uses a different polarity grouping than the original UEQ, so raw scores are not directly comparable across the two.
Web-Specific: SUPR-Q
The Standardized User Experience Percentile Rank Questionnaire [16] is built specifically for websites. It uses 8 items across 4 factors (Usability, Trust, Appearance, Loyalty) in a mixed format: 7 items on a 5-point Likert scale plus 1 NPS-style 0–10 likelihood-to-recommend item.
The output is unusual and useful: SUPR-Q maps raw scores to a percentile rank against MeasuringU's reference database (initial N = 2,513 responses across 70 websites; updated quarterly). A SUPR-Q score of 4.1 sits roughly at the 70th percentile of catalogued websites.
Pick SUPR-Q over SUS when the artifact is a website (marketing site, e-commerce, content portal) and you want trust and visual aesthetic captured alongside usability. SUS is interface-agnostic and ignores both. Skip SUPR-Q for native apps, hardware, or back-office software; the norms do not transfer.
Choosing the Right Instrument
| Instrument | Measures | Best For | Minimum Sample |
|---|---|---|---|
| SUS | Perceived usability (10 items) | Post-study overall assessment | 12-14 |
| SEQ | Task difficulty (1 item) | After each task | 5-10 per task |
| NPS | Recommendation intent (1 item) | Customer loyalty tracking | 30+ |
| PMF Score | Perceived indispensability (1 item) | Product-market fit validation | 30-40 qualified users |
| UEQ | 6 UX dimensions (26 items) | Comprehensive UX assessment | 20+ |
| UMUX | Perceived usability (4 items, ISO 9241-11 framing) | SUS-comparable result with fewer items | see catalog |
| UMUX-LITE | Perceived usability (2 items, positive tone) | Tightest survey real estate | see catalog |
| UEQ-S | Pragmatic + Hedonic Quality (8 items) | Short UX sweep keeping dimensionality | see catalog |
| UEQ+ | Modular UX (up to 16 selectable scales) | Tailored UX KPI per product | see catalog |
| SUPR-Q | Website UX: Usability, Trust, Appearance, Loyalty (8 items) | Website benchmarking with percentile ranks | see catalog |
For how to use these instruments in longitudinal benchmarking, see UX Benchmarking: Measuring Progress Over Time.
What Is a "Good" Score?
Raw scores are meaningless without context. Here are the benchmarks you need to interpret your results.
SUS Benchmarks
| Score Range | Interpretation |
|---|---|
| 85+ | Excellent: Top 10% of products |
| 71-84 | Good: Above average usability |
| 68 | Average: The industry midpoint |
| 51-67 | Below Average: Needs improvement |
| Below 50 | Failure: Serious usability problems |
The magic number to remember: 68 is average. Anything above 71 is genuinely good. Below 50 indicates fundamental problems that will frustrate most users.
NPS Benchmarks
NPS is highly industry-dependent. A score of +30 might be excellent in one industry and mediocre in another.
SEQ Benchmarks
On a 7-point scale:
- Average: ~5.5
- Above 5.5: Task was perceived as easier than average
- Below 4.5: Task was perceived as difficult; investigate further
However, SEQ depends heavily on inherent task difficulty. A complex task (e.g., "Configure your tax settings") will naturally score lower than a simple one (e.g., "Find the search bar"). Compare SEQ scores across versions of the same task, not across different tasks.
For the statistical techniques to properly interpret instrument scores, see Quantitative Analysis: From Metrics to Significance.
Common Mistakes
Modifying Validated Instruments
"We only need 5 of the SUS questions." No. Validated instruments work as complete packages. Removing or rewording items invalidates the benchmarks.
Over-relying on Single Metrics
NPS alone does not tell you what to fix. SUS alone does not explain why users struggle. Quantitative metrics tell you that something is happening; qualitative data tells you why.
Ignoring Context
A SUS score of 72 might be excellent for complex enterprise software and mediocre for a consumer mobile app. Always consider the product category when interpreting scores.
Treating Scores as Precise
All measurements have uncertainty. A SUS score of 72 might really be "somewhere between 65 and 79." Report confidence intervals, especially with smaller samples.
What This Means for Practice
Standardized instruments are powerful tools when used correctly:
- Choose the right instrument for what you actually need to measure
- Administer them correctly, at the right time, with complete items, to enough participants
- Interpret with benchmarks, raw scores are meaningless without context
- Combine with qualitative data, metrics quantify problems; observation reveals them
- Report uncertainty, acknowledge the precision limits of your sample size
The goal is not to chase a score. It is to use measurement as one input into better design decisions.
To calculate the sample size needed for reliable instrument scores, see the Sample Size Calculator.
References
- [1]John Brooke. (1996). "SUS: A 'Quick and Dirty' Usability Scale". Usability Evaluation in Industry.Link
- [2]Aaron Bangor et al.. (2009). "Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale". Journal of Usability Studies.Link
- [3]Jeff Sauro & James R. Lewis. (2016). "Quantifying the User Experience: Practical Statistics for User Research, 2nd ed. — SUS Curved Grading Scale (Ch. 8)". Morgan Kaufmann.Link
- [4]John Brooke. (2013). "SUS: A Retrospective". Journal of Usability Studies.Link
- [5]
- [6]James R. Lewis et al.. (2013). "UMUX-LITE: When There's No Time for the SUS". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM.DOI
- [7]Frederick F. Reichheld. (2003). "The One Number You Need to Grow". Harvard Business Review.Link
- [8]Sean Ellis. (2009). "The Startup Pyramid". Startup Marketing Blog.Link
- [9]Sean Ellis & Morgan Brown. (2017). "Hacking Growth: How Today's Fastest-Growing Companies Drive Breakout Success". Crown Business.
- [10]Sean Ellis. (2019). "Using Product/Market Fit to Drive Sustainable Growth". Medium / Growth Hackers.Link
- [11]
- [12]Jeff Sauro & James R. Lewis. (2016). "Quantifying the User Experience: Practical Statistics for User Research". Morgan Kaufmann.Link
- [13]
- [14]
- [15]
- [16]Jeff Sauro. (2015). "SUPR-Q: A Comprehensive Measure of the Quality of the Website User Experience". Journal of Usability Studies, 10(2), 68–86.Link