Skip to content
UPCOMING EVENTS:UX, Product & Market Research Afterwork23. Apr.@Packhaus WienDetailsInsights & Research Breakfast16. Mai@Packhaus WienDetailsVibecoding & Agentic Coding for App Development22. Mai@Packhaus WienDetails
UPCOMING EVENTS:UX, Product & Market Research Afterwork23. Apr.@Packhaus WienDetailsInsights & Research Breakfast16. Mai@Packhaus WienDetailsVibecoding & Agentic Coding for App Development22. Mai@Packhaus WienDetails

UX Measurement Instruments: Scales, Scores, and What They Actually Measure

Standardized measurement instruments provide benchmarks and comparability. But using them effectively requires understanding what each one actually measures, and what it does not.

Marc Busch
Updated September 23, 2024
9 min read

Summary

Standardized UX instruments like SUS, NPS, SEQ, PMF Score, and UEQ provide validated, comparable measurements. Each targets different constructs: SUS measures perceived usability, NPS measures loyalty/recommendation intent, SEQ captures task-level difficulty, and the PMF Score measures perceived indispensability. Understanding what each instrument measures, and its limitations, prevents misuse and misinterpretation. Always pair metrics with qualitative data to understand the 'why' behind scores.

When you need to measure or quantitatively, do not invent your own questions. Use standardized instruments that have been validated through research.

These tools provide:

  • : Consistent results across administrations
  • : Actually measuring what they claim to measure
  • Benchmarks: Data from thousands of studies for comparison
  • Comparability: Ability to compare your results to industry standards

But using them effectively requires understanding what each one measures, and what it does not.

Which Tool for Which Layer?

Before diving into individual instruments, understand that measurement happens at different layers of the user experience. Choose your tool based on what layer you are measuring.

LayerAcronymWhat It MeasuresBest For
Micro (Task)SEQTask-level difficultyImmediate post-task feedback
Meso (Product)SUS / UEQOverall product usabilityBenchmarking the full experience
Macro (Relationship)NPSLoyalty and recommendation intentTracking customer sentiment over time

Micro-Level: The Task

Use the SEQ (Single Ease Question) immediately after a task. It captures the friction of a specific interaction while the experience is still fresh.

Meso-Level: The Product

Use the SUS (System Usability Scale) or UEQ to benchmark the overall usability of the product or application. Administer these after the participant has completed all core tasks.

Macro-Level: The Relationship

Use NPS (Net Promoter Score) to track the overall customer relationship and loyalty over time.

The SUS [1] is the most widely used standardized usability questionnaire. It consists of 10 statements rated on a 5-point scale from "Strongly Disagree" to "Strongly Agree."

What It Measures

SUS measures perceived usability, participants' subjective impression of how usable a system is. The final score ranges from 0 to 100.

Interpreting SUS Scores

Based on extensive research [3], SUS scores can be interpreted as:

ScoreAdjective RatingPercentile Rank
84.1+Best ImaginableTop 10%
80.3ExcellentTop 20%
68OK (Average)~50th percentile
51PoorBottom 20%
Below 50AwfulBottom 10%

The industry average is approximately 68 [2].

Practical Considerations

When to administer: After the participant has completed core tasks, not before they have had meaningful interaction.

Minimum sample: For stable scores, aim for at least 12-14 participants. With fewer than 8, confidence intervals become very wide.

Do not modify the questions: The scale has been validated as a complete instrument. Changing wording or removing items invalidates the norms.

Net Promoter Score ()

NPS [4] measures customer loyalty through a single question: "How likely are you to recommend [product/company] to a friend or colleague?" rated 0-10.

How It Works

Respondents are classified as:

  • Promoters (9-10): Loyal enthusiasts who will keep buying and refer others
  • Passives (7-8): Satisfied but unenthusiastic customers
  • Detractors (0-6): Unhappy customers who can damage your brand

NPS = % Promoters - % Detractors

The score ranges from -100 (everyone is a detractor) to +100 (everyone is a promoter).

What It Actually Measures

NPS measures recommendation intent, which is often used as a proxy for loyalty and growth potential. However, NPS is controversial among researchers:

Criticisms:

  • A single question cannot capture the complexity of customer loyalty
  • The classification (0-6 as "detractors") is somewhat arbitrary
  • NPS does not explain why someone would or would not recommend
  • Cultural differences affect how people use the scale

When to Use NPS

NPS is appropriate for:

  • Tracking overall brand or product sentiment over time
  • Segmenting customers by loyalty
  • Creating a simple KPI for executive dashboards

It is not appropriate for:

  • Evaluating specific features or interface changes
  • Replacing usability testing
  • Making detailed design decisions

Product-Market Fit Score (The Sean Ellis Test)

While NPS measures whether customers would recommend your product, the Product-Market Fit (PMF) Score measures something different: how much they would miss it if it disappeared.

The question is simple: "How would you feel if you could no longer use [product]?"

Response options:

  • Very disappointed
  • Somewhat disappointed
  • Not disappointed
  • N/A (I no longer use it)

The 40% Benchmark

The key metric is the percentage of respondents who select "Very disappointed." Sean Ellis, who coined the term "growth hacking," proposed a simple heuristic: if more than 40% of your users say they would be "very disappointed" without your product, you have achieved product-market fit.

This is a startup heuristic, not a scientifically validated threshold. The 40% figure emerged from pattern-matching across successful startups rather than controlled research. Treat it as a useful rule of thumb, not a hard cutoff.

What It Actually Measures

The PMF Score captures perceived indispensability. It asks users to imagine life without the product and report the emotional weight of that loss.

This is distinct from both usability and loyalty:

  • A product can be highly usable (strong SUS score) without being indispensable
  • A product can generate strong recommendation intent (high NPS) for reasons unrelated to personal reliance
  • The PMF Score specifically targets whether the product has become woven into the user's workflow or life

When to Use It

The PMF Score is most valuable for:

  • Early-stage products seeking validation before scaling
  • Products searching for their core audience segment
  • Tracking whether new features increase or decrease perceived value

It is less useful for mature products with established market positions, where the question becomes less about "do we have fit" and more about "how do we expand and retain."

Single Ease Question (SEQ)

The SEQ asks: "Overall, how easy or difficult was this task?" rated on a 7-point scale from "Very Difficult" to "Very Easy."

When to Use It

Administer SEQ immediately after each task in a . It captures in-the-moment perceived difficulty before memory fades.

Interpretation

Based on benchmark data [6]:

  • Average SEQ: ~5.5
  • Scores above 5.5 indicate above-average ease
  • Scores below 4.5 suggest significant difficulty

SEQ correlates with task success and time on task, making it a useful quick check even when objective metrics are available.

User Experience Questionnaire (UEQ)

The UEQ [5] is a more comprehensive instrument measuring six dimensions of user experience:

  1. Attractiveness: Overall impression of the product
  2. Perspicuity: How easy it is to learn
  3. Efficiency: How quickly users can accomplish goals
  4. Dependability: How in control users feel
  5. Stimulation: How exciting or motivating to use
  6. Novelty: How innovative or creative the design seems

When to Use It

UEQ is appropriate when you need a more nuanced view of user experience beyond usability alone. It captures both pragmatic quality (efficiency, perspicuity, dependability) and hedonic quality (stimulation, novelty).

Practical Notes

  • Takes about 3-5 minutes to complete
  • Requires at least 20 participants for reliable results
  • Free to use, with online benchmarks available

Choosing the Right Instrument

InstrumentMeasuresBest ForMinimum Sample
SUSPerceived usabilityPost-study overall assessment12-14
SEQTask difficultyAfter each task5-10 per task
NPSRecommendation intentCustomer loyalty tracking30+
PMF ScorePerceived indispensabilityProduct-market fit validation30+
UEQ6 UX dimensionsComprehensive UX assessment20+

What Is a "Good" Score?

Raw scores are meaningless without context. Here are the benchmarks you need to interpret your results.

SUS Benchmarks

Score RangeInterpretation
85+Excellent — Top 10% of products
71-84Good — Above average usability
68Average — The industry midpoint
51-67Below Average — Needs improvement
Below 50Failure — Serious usability problems

The magic number to remember: 68 is average. Anything above 71 is genuinely good. Below 50 indicates fundamental problems that will frustrate most users.

NPS Benchmarks

NPS is highly industry-dependent. A score of +30 might be excellent in one industry and mediocre in another.

SEQ Benchmarks

On a 7-point scale:

  • Average: ~5.5
  • Above 5.5: Task was perceived as easier than average
  • Below 4.5: Task was perceived as difficult—investigate further

However, SEQ depends heavily on inherent task difficulty. A complex task (e.g., "Configure your tax settings") will naturally score lower than a simple one (e.g., "Find the search bar"). Compare SEQ scores across versions of the same task, not across different tasks.

Common Mistakes

Modifying Validated Instruments

"We only need 5 of the SUS questions." No. Validated instruments work as complete packages. Removing or rewording items invalidates the benchmarks.

Over-relying on Single Metrics

NPS alone does not tell you what to fix. SUS alone does not explain why users struggle. metrics tell you that something is happening; data tells you why.

Ignoring Context

A SUS score of 72 might be excellent for complex enterprise software and mediocre for a consumer mobile app. Always consider the product category when interpreting scores.

Treating Scores as Precise

All measurements have . A SUS score of 72 might really be "somewhere between 65 and 79." Report confidence intervals, especially with smaller samples.

What This Means for Practice

Standardized instruments are powerful tools when used correctly:

  1. Choose the right instrument for what you actually need to measure
  2. Administer them correctly, at the right time, with complete items, to enough participants
  3. Interpret with benchmarks, raw scores are meaningless without context
  4. Combine with qualitative data, metrics quantify problems; observation reveals them
  5. Report uncertainty, acknowledge the precision limits of your sample size

The goal is not to chase a score. It is to use measurement as one input into better design decisions.

References

  1. [1]
    John Brooke. (1996). "SUS: A 'Quick and Dirty' Usability Scale". Usability Evaluation in Industry.Link
  2. [2]
    John Brooke. (2013). "SUS: A Retrospective". Journal of Usability Studies.Link
  3. [3]
    Aaron Bangor et al.. (2009). "Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale". Journal of Usability Studies.Link
  4. [4]
    Frederick F. Reichheld. (2003). "The One Number You Need to Grow". Harvard Business Review.Link
  5. [5]
    Bettina Laugwitz et al.. (2008). "Construction and Evaluation of a User Experience Questionnaire". HCI and Usability for Education and Work.LinkDOI
  6. [6]
    Jeff Sauro & James R. Lewis. (2016). "Quantifying the User Experience: Practical Statistics for User Research". Morgan Kaufmann.Link

READY TO TAKE ACTION?

Let's discuss how these insights can drive your business forward.

UX Measurement Instruments: Scales, Scores, and What They Actually Measure | Busch Labs | Busch Labs