UX Measurement Instruments: Scales, Scores, and What They...

When you need to measure user experience or usability quantitatively, do not invent your own questions. Use standardized instruments that have been validated through research.

These tools provide:

Reliability: Consistent results across administrations
Validity: Actually measuring what they claim to measure (see Types of Validity)
Benchmarks: Data from thousands of studies for comparison
Comparability: Ability to compare your results to industry standards

But using them effectively requires understanding what each one measures, and what it does not.

The interactive scale catalog above lets you filter all 195+ catalogued instruments by construct, domain, benchmark availability, and publication year. This article focuses on the most-used core instruments and how to choose between them.

Which Tool for Which Layer?

Before diving into individual instruments, understand that measurement happens at different layers of the user experience. Choose your tool based on what layer you are measuring.

Layer	Acronym	What It Measures	Best For
Micro (Task)	SEQ	Task-level difficulty	Immediate post-task feedback
Meso (Product)	SUS / UEQ	Overall product usability	Benchmarking the full experience
Macro (Relationship)	NPS	Loyalty and recommendation intent	Tracking customer sentiment over time

Micro-Level: The Task

Use the SEQ (Single Ease Question) immediately after a task. It captures the friction of a specific interaction while the experience is still fresh.

Meso-Level: The Product

Use the SUS (System Usability Scale) or UEQ to benchmark the overall usability of the product or application. Administer these after the participant has completed all core tasks.

Macro-Level: The Relationship

Use NPS (Net Promoter Score) to track the overall customer relationship and loyalty over time.

For the experience components that each instrument layer maps to, see Components of Experience.

System Usability Scale (SUS)

The SUS ^[1] is the most widely used standardized usability questionnaire. It consists of 10 statements rated on a 5-point scale from "Strongly Disagree" to "Strongly Agree."

What It Measures

SUS measures perceived usability, participants' subjective impression of how usable a system is. The final score ranges from 0 to 100.

Interpreting SUS Scores

Two complementary benchmarks are in routine use: the Adjective Rating Scale (Bangor, Kortum & Miller 2009) for quick stakeholder communication, and the Curved Grading Scale (Sauro & Lewis 2016) for finer-grained letter grades anchored to empirical percentiles.

Adjective Rating Scale (Bangor 2009)

Based on extensive research ^[2], SUS scores can be communicated using everyday adjectives:

Score	Adjective Rating	Percentile Rank
84.1+	Best Imaginable	Top 10%
80.3	Excellent	Top 20%
68	OK (Average)	~50th percentile
51	Poor	Bottom 20%
Below 50	Awful	Bottom 10%

Use this scheme when you need a quick label for a stakeholder readout ("Excellent", "Awful"). Use the Curved Grading Scale below when you want a finer-grained letter grade tied to empirical percentiles.

Curved Grading Scale (Sauro & Lewis 2016)

The curved grading scale ^[3] maps SUS scores to letter grades. It is anchored to Sauro's (2011) reference dataset of 5,000+ responses across roughly 500 studies (M = 68, SD = 12.5).

Grade	Score Range	Percentile
A+	84.1+	Top 5%
A	80.8 – 84.0	Top 10%
A-	78.9 – 80.7	Top 15%
B+	77.2 – 78.8	Top 20%
B	74.1 – 77.1	Top 30%
B-	72.6 – 74.0	Top 35%
C+	71.1 – 72.5	Top 40%
C	65.0 – 71.0	~Average (M=68, SD=12.5)
C-	62.7 – 64.9	Bottom 35%
D	51.7 – 62.6	Bottom 15%
F	Below 51.7	Bottom 10%

The industry average is approximately 68 ^[4].

Practical Considerations

When to administer: After the participant has completed core tasks, not before they have had meaningful interaction.

Minimum sample: For stable scores, aim for at least 12-14 participants. With fewer than 8, confidence intervals become very wide.

Do not modify the questions: The scale has been validated as a complete instrument. Changing wording or removing items invalidates the norms.

Lighter SUS Alternatives

When 10 items feel like too much survey real estate, two derivatives target the same construct (perceived usability) at lower cost.

UMUX ^[5] is a 4-item, 7-point Likert measure framed around the ISO 9241-11 components (effectiveness, efficiency, satisfaction). It correlates strongly with SUS in head-to-head studies. The trade-off is its mixed positive/negative item tone, which has been challenged on dimensionality grounds. Pick UMUX when you want SUS-comparable usability with less than half the items, but be aware that the structural debate is unresolved.

UMUX-LITE ^[6] shrinks the instrument further to 2 positive-tone items: "This system's capabilities meet my requirements" and "This system is easy to use." It is the right pick when survey real estate is tightest (post-task pop-ups, in-app intercepts). Raw UMUX-LITE scores run elevated by roughly 5–10 points compared with SUS, so apply the published regression formula before benchmarking against SUS norms.

For full psychometrics, factor structure debates, and benchmark availability across these alternatives, filter the catalog above by construct = "Perceived Usability".

Net Promoter Score (NPS)

NPS ^[7] measures customer loyalty through a single question: "How likely are you to recommend [product/company] to a friend or colleague?" rated 0-10.

How It Works

Respondents are classified as:

Promoters (9-10): Loyal enthusiasts who will keep buying and refer others
Passives (7-8): Satisfied but unenthusiastic customers
Detractors (0-6): Unhappy customers who can damage your brand

NPS = % Promoters - % Detractors

The score ranges from -100 (everyone is a detractor) to +100 (everyone is a promoter).

What It Actually Measures

NPS measures recommendation intent, which is often used as a proxy for loyalty and growth potential. However, NPS is controversial among researchers:

Criticisms:

A single question cannot capture the complexity of customer loyalty
The classification (0-6 as "detractors") is somewhat arbitrary
NPS does not explain why someone would or would not recommend
Cultural differences affect how people use the scale

When to Use NPS

NPS is appropriate for:

Tracking overall brand or product sentiment over time
Segmenting customers by loyalty
Creating a simple KPI for executive dashboards

It is not appropriate for:

Evaluating specific features or interface changes
Replacing usability testing
Making detailed design decisions

Product-Market Fit Score (The Sean Ellis Test)

While NPS measures whether customers would recommend your product, the Product-Market Fit (PMF) Score measures something different: how much they would miss it if it disappeared. The instrument was popularized in Hacking Growth ^[9].

The question is simple: "How would you feel if you could no longer use [product]?"

Response options:

Very disappointed
Somewhat disappointed
Not disappointed (it really isn't that useful)
N/A: I no longer use it

The 40% Benchmark

The key metric is the percentage of respondents who select "Very disappointed." Sean Ellis, who coined the term "growth hacking," proposed a simple heuristic ^[8]: if more than 40% of your users say they would be "very disappointed" without your product, you have achieved product-market fit.

Not a psychometrically validated instrument. The 40% figure is a practitioner heuristic, derived from pattern-matching across early-stage startups; Sean Ellis himself has called it "a bit arbitrary". Treat it as a useful rule of thumb, not a hard cutoff.

Who Counts as a Respondent

Sample composition matters more than headline percentage. Ellis prescribes strict user qualification ^[10]: include only respondents who

have experienced the core product (not landing-page visitors or trial drop-offs),
have used it at least twice, and
have used it within the last 14 days.

Recommended sample size is roughly 30–40 qualified active users. Below that, the percentage swings too much per respondent to be informative.

What It Actually Measures

The PMF Score captures perceived indispensability. It asks users to imagine life without the product and report the emotional weight of that loss.

This is distinct from both usability and loyalty:

A product can be highly usable (strong SUS score) without being indispensable
A product can generate strong recommendation intent (high NPS) for reasons unrelated to personal reliance
The PMF Score specifically targets whether the product has become woven into the user's workflow or life

When to Use It

The PMF Score is most valuable for:

Early-stage products seeking validation before scaling
Products searching for their core audience segment
Tracking whether new features increase or decrease perceived value

It is less useful for mature products with established market positions, where the question becomes less about "do we have fit" and more about "how do we expand and retain."

Single Ease Question (SEQ)

The SEQ ^[11] asks: "Overall, how easy or difficult was this task?" rated on a 7-point scale from "Very Difficult" to "Very Easy."

When to Use It

Administer SEQ immediately after each task in a UX test. It captures in-the-moment perceived difficulty before memory fades.

Interpretation

Based on benchmark data ^[12]:

Average SEQ: ~5.5
Scores above 5.5 indicate above-average ease
Scores below 4.5 suggest significant difficulty

SEQ correlates with task success and time on task, making it a useful quick check even when objective metrics are available.

User Experience Questionnaire (UEQ)

The UEQ ^[13] is a more comprehensive instrument measuring six dimensions of user experience:

Attractiveness: Overall impression of the product
Perspicuity: How easy it is to learn
Efficiency: How quickly users can accomplish goals
Dependability: How in control users feel
Stimulation: How exciting or motivating to use
Novelty: How innovative or creative the design seems

When to Use It

UEQ is appropriate when you need a more nuanced view of user experience beyond usability alone. It captures both pragmatic quality (efficiency, perspicuity, dependability) and hedonic quality (stimulation, novelty).

Practical Notes

26 items, 7-point semantic differential, 3–5 minutes to complete
Requires at least 20 participants for reliable results
Free to use, with online benchmarks (Schrepp, Hinderks & Thomaschewski 2017; N = 21,175 responses across 452 product evaluations)

UEQ Family

The UEQ has two siblings worth knowing about, both authored by the original UEQ team.

UEQ-S ^[14] is the 8-item short version. The 26 UEQ items collapse to two meta-factors, Pragmatic Quality (efficiency, perspicuity, dependability) and Hedonic Quality (stimulation, novelty), with 4 items each. It reproduces the full UEQ overall UX score with a mean deviation of 0.06 and shares the UEQ benchmark dataset. Pick UEQ-S when 26 items are too long for your survey but you still want to separate pragmatic from hedonic quality.

UEQ+ ^[15] is a modular framework rather than a fixed questionnaire. It offers 16 selectable scales (the original 6 plus Trust, Intuitive Use, Adaptability, Usefulness, Visual Aesthetics, Value, Stickiness, Content Quality, Trustworthiness of Content, and Immersion), 4 bipolar item pairs per scale, plus an importance rating per scale that lets you weight scales into a tailored UX KPI. Pick UEQ+ when you need a product-specific UX KPI (a content app cares about Content Quality and Stickiness; a fintech tool cares about Trust and Usefulness). Note that UEQ+ uses a different polarity grouping than the original UEQ, so raw scores are not directly comparable across the two.

Web-Specific: SUPR-Q

The Standardized User Experience Percentile Rank Questionnaire ^[16] is built specifically for websites. It uses 8 items across 4 factors (Usability, Trust, Appearance, Loyalty) in a mixed format: 7 items on a 5-point Likert scale plus 1 NPS-style 0–10 likelihood-to-recommend item.

The output is unusual and useful: SUPR-Q maps raw scores to a percentile rank against MeasuringU's reference database (initial N = 2,513 responses across 70 websites; updated quarterly). A SUPR-Q score of 4.1 sits roughly at the 70th percentile of catalogued websites.

Pick SUPR-Q over SUS when the artifact is a website (marketing site, e-commerce, content portal) and you want trust and visual aesthetic captured alongside usability. SUS is interface-agnostic and ignores both. Skip SUPR-Q for native apps, hardware, or back-office software; the norms do not transfer.

Choosing the Right Instrument

Instrument	Measures	Best For	Minimum Sample
SUS	Perceived usability (10 items)	Post-study overall assessment	12-14
SEQ	Task difficulty (1 item)	After each task	5-10 per task
NPS	Recommendation intent (1 item)	Customer loyalty tracking	30+
PMF Score	Perceived indispensability (1 item)	Product-market fit validation	30-40 qualified users
UEQ	6 UX dimensions (26 items)	Comprehensive UX assessment	20+
UMUX	Perceived usability (4 items, ISO 9241-11 framing)	SUS-comparable result with fewer items	see catalog
UMUX-LITE	Perceived usability (2 items, positive tone)	Tightest survey real estate	see catalog
UEQ-S	Pragmatic + Hedonic Quality (8 items)	Short UX sweep keeping dimensionality	see catalog
UEQ+	Modular UX (up to 16 selectable scales)	Tailored UX KPI per product	see catalog
SUPR-Q	Website UX: Usability, Trust, Appearance, Loyalty (8 items)	Website benchmarking with percentile ranks	see catalog

For how to use these instruments in longitudinal benchmarking, see UX Benchmarking: Measuring Progress Over Time.

What Is a "Good" Score?

Raw scores are meaningless without context. Here are the benchmarks you need to interpret your results.

SUS Benchmarks

Score Range	Interpretation
85+	Excellent: Top 10% of products
71-84	Good: Above average usability
68	Average: The industry midpoint
51-67	Below Average: Needs improvement
Below 50	Failure: Serious usability problems

The magic number to remember: 68 is average. Anything above 71 is genuinely good. Below 50 indicates fundamental problems that will frustrate most users.

NPS Benchmarks

NPS is highly industry-dependent. A score of +30 might be excellent in one industry and mediocre in another.

SEQ Benchmarks

On a 7-point scale:

Average: ~5.5
Above 5.5: Task was perceived as easier than average
Below 4.5: Task was perceived as difficult; investigate further

However, SEQ depends heavily on inherent task difficulty. A complex task (e.g., "Configure your tax settings") will naturally score lower than a simple one (e.g., "Find the search bar"). Compare SEQ scores across versions of the same task, not across different tasks.

For the statistical techniques to properly interpret instrument scores, see Quantitative Analysis: From Metrics to Significance.

Common Mistakes

Modifying Validated Instruments

"We only need 5 of the SUS questions." No. Validated instruments work as complete packages. Removing or rewording items invalidates the benchmarks.

Over-relying on Single Metrics

NPS alone does not tell you what to fix. SUS alone does not explain why users struggle. Quantitative metrics tell you that something is happening; qualitative data tells you why.

Ignoring Context

A SUS score of 72 might be excellent for complex enterprise software and mediocre for a consumer mobile app. Always consider the product category when interpreting scores.

Treating Scores as Precise

All measurements have uncertainty. A SUS score of 72 might really be "somewhere between 65 and 79." Report confidence intervals, especially with smaller samples.

What This Means for Practice

Standardized instruments are powerful tools when used correctly:

Choose the right instrument for what you actually need to measure
Administer them correctly, at the right time, with complete items, to enough participants
Interpret with benchmarks, raw scores are meaningless without context
Combine with qualitative data, metrics quantify problems; observation reveals them
Report uncertainty, acknowledge the precision limits of your sample size

The goal is not to chase a score. It is to use measurement as one input into better design decisions.

To calculate the sample size needed for reliable instrument scores, see the Sample Size Calculator.

When you need to measure user experience or usability quantitatively, do not invent your own questions. Use standardized instruments that have been validated through research.

These tools provide:

Reliability: Consistent results across administrations
Validity: Actually measuring what they claim to measure (see Types of Validity)
Benchmarks: Data from thousands of studies for comparison
Comparability: Ability to compare your results to industry standards

But using them effectively requires understanding what each one measures, and what it does not.

Which Tool for Which Layer?

Before diving into individual instruments, understand that measurement happens at different layers of the user experience. Choose your tool based on what layer you are measuring.

Layer	Acronym	What It Measures	Best For
Micro (Task)	SEQ	Task-level difficulty	Immediate post-task feedback
Meso (Product)	SUS / UEQ	Overall product usability	Benchmarking the full experience
Macro (Relationship)	NPS	Loyalty and recommendation intent	Tracking customer sentiment over time

Micro-Level: The Task

Use the SEQ (Single Ease Question) immediately after a task. It captures the friction of a specific interaction while the experience is still fresh.

Meso-Level: The Product

Use the SUS (System Usability Scale) or UEQ to benchmark the overall usability of the product or application. Administer these after the participant has completed all core tasks.

Macro-Level: The Relationship

Use NPS (Net Promoter Score) to track the overall customer relationship and loyalty over time.

For the experience components that each instrument layer maps to, see Components of Experience.

System Usability Scale (SUS)

The SUS ^[1] is the most widely used standardized usability questionnaire. It consists of 10 statements rated on a 5-point scale from "Strongly Disagree" to "Strongly Agree."

What It Measures

SUS measures perceived usability, participants' subjective impression of how usable a system is. The final score ranges from 0 to 100.

Interpreting SUS Scores

Adjective Rating Scale (Bangor 2009)

Based on extensive research ^[2], SUS scores can be communicated using everyday adjectives:

Score	Adjective Rating	Percentile Rank
84.1+	Best Imaginable	Top 10%
80.3	Excellent	Top 20%
68	OK (Average)	~50th percentile
51	Poor	Bottom 20%
Below 50	Awful	Bottom 10%

Curved Grading Scale (Sauro & Lewis 2016)

The curved grading scale ^[3] maps SUS scores to letter grades. It is anchored to Sauro's (2011) reference dataset of 5,000+ responses across roughly 500 studies (M = 68, SD = 12.5).

Grade	Score Range	Percentile
A+	84.1+	Top 5%
A	80.8 – 84.0	Top 10%
A-	78.9 – 80.7	Top 15%
B+	77.2 – 78.8	Top 20%
B	74.1 – 77.1	Top 30%
B-	72.6 – 74.0	Top 35%
C+	71.1 – 72.5	Top 40%
C	65.0 – 71.0	~Average (M=68, SD=12.5)
C-	62.7 – 64.9	Bottom 35%
D	51.7 – 62.6	Bottom 15%
F	Below 51.7	Bottom 10%

The industry average is approximately 68 ^[4].

Practical Considerations

When to administer: After the participant has completed core tasks, not before they have had meaningful interaction.

Minimum sample: For stable scores, aim for at least 12-14 participants. With fewer than 8, confidence intervals become very wide.

Do not modify the questions: The scale has been validated as a complete instrument. Changing wording or removing items invalidates the norms.

Lighter SUS Alternatives

When 10 items feel like too much survey real estate, two derivatives target the same construct (perceived usability) at lower cost.

For full psychometrics, factor structure debates, and benchmark availability across these alternatives, filter the catalog above by construct = "Perceived Usability".

Net Promoter Score (NPS)

NPS ^[7] measures customer loyalty through a single question: "How likely are you to recommend [product/company] to a friend or colleague?" rated 0-10.

How It Works

Respondents are classified as:

Promoters (9-10): Loyal enthusiasts who will keep buying and refer others
Passives (7-8): Satisfied but unenthusiastic customers
Detractors (0-6): Unhappy customers who can damage your brand

NPS = % Promoters - % Detractors

The score ranges from -100 (everyone is a detractor) to +100 (everyone is a promoter).

What It Actually Measures

NPS measures recommendation intent, which is often used as a proxy for loyalty and growth potential. However, NPS is controversial among researchers:

Criticisms:

A single question cannot capture the complexity of customer loyalty
The classification (0-6 as "detractors") is somewhat arbitrary
NPS does not explain why someone would or would not recommend
Cultural differences affect how people use the scale

When to Use NPS

NPS is appropriate for:

Tracking overall brand or product sentiment over time
Segmenting customers by loyalty
Creating a simple KPI for executive dashboards

It is not appropriate for:

Evaluating specific features or interface changes
Replacing usability testing
Making detailed design decisions

Product-Market Fit Score (The Sean Ellis Test)

The question is simple: "How would you feel if you could no longer use [product]?"

Response options:

Very disappointed
Somewhat disappointed
Not disappointed (it really isn't that useful)
N/A: I no longer use it

The 40% Benchmark

Who Counts as a Respondent

Sample composition matters more than headline percentage. Ellis prescribes strict user qualification ^[10]: include only respondents who

have experienced the core product (not landing-page visitors or trial drop-offs),
have used it at least twice, and
have used it within the last 14 days.

Recommended sample size is roughly 30–40 qualified active users. Below that, the percentage swings too much per respondent to be informative.

What It Actually Measures

The PMF Score captures perceived indispensability. It asks users to imagine life without the product and report the emotional weight of that loss.

This is distinct from both usability and loyalty:

A product can be highly usable (strong SUS score) without being indispensable
A product can generate strong recommendation intent (high NPS) for reasons unrelated to personal reliance
The PMF Score specifically targets whether the product has become woven into the user's workflow or life

When to Use It

The PMF Score is most valuable for:

Early-stage products seeking validation before scaling
Products searching for their core audience segment
Tracking whether new features increase or decrease perceived value

It is less useful for mature products with established market positions, where the question becomes less about "do we have fit" and more about "how do we expand and retain."

Single Ease Question (SEQ)

The SEQ ^[11] asks: "Overall, how easy or difficult was this task?" rated on a 7-point scale from "Very Difficult" to "Very Easy."

When to Use It

Administer SEQ immediately after each task in a UX test. It captures in-the-moment perceived difficulty before memory fades.

Interpretation

Based on benchmark data ^[12]:

Average SEQ: ~5.5
Scores above 5.5 indicate above-average ease
Scores below 4.5 suggest significant difficulty

SEQ correlates with task success and time on task, making it a useful quick check even when objective metrics are available.

User Experience Questionnaire (UEQ)

The UEQ ^[13] is a more comprehensive instrument measuring six dimensions of user experience:

Attractiveness: Overall impression of the product
Perspicuity: How easy it is to learn
Efficiency: How quickly users can accomplish goals
Dependability: How in control users feel
Stimulation: How exciting or motivating to use
Novelty: How innovative or creative the design seems

When to Use It

Practical Notes

26 items, 7-point semantic differential, 3–5 minutes to complete
Requires at least 20 participants for reliable results
Free to use, with online benchmarks (Schrepp, Hinderks & Thomaschewski 2017; N = 21,175 responses across 452 product evaluations)

UEQ Family

The UEQ has two siblings worth knowing about, both authored by the original UEQ team.

Web-Specific: SUPR-Q

Choosing the Right Instrument

Instrument	Measures	Best For	Minimum Sample
SUS	Perceived usability (10 items)	Post-study overall assessment	12-14
SEQ	Task difficulty (1 item)	After each task	5-10 per task
NPS	Recommendation intent (1 item)	Customer loyalty tracking	30+
PMF Score	Perceived indispensability (1 item)	Product-market fit validation	30-40 qualified users
UEQ	6 UX dimensions (26 items)	Comprehensive UX assessment	20+
UMUX	Perceived usability (4 items, ISO 9241-11 framing)	SUS-comparable result with fewer items	see catalog
UMUX-LITE	Perceived usability (2 items, positive tone)	Tightest survey real estate	see catalog
UEQ-S	Pragmatic + Hedonic Quality (8 items)	Short UX sweep keeping dimensionality	see catalog
UEQ+	Modular UX (up to 16 selectable scales)	Tailored UX KPI per product	see catalog
SUPR-Q	Website UX: Usability, Trust, Appearance, Loyalty (8 items)	Website benchmarking with percentile ranks	see catalog

For how to use these instruments in longitudinal benchmarking, see UX Benchmarking: Measuring Progress Over Time.

What Is a "Good" Score?

Raw scores are meaningless without context. Here are the benchmarks you need to interpret your results.

SUS Benchmarks

Score Range	Interpretation
85+	Excellent: Top 10% of products
71-84	Good: Above average usability
68	Average: The industry midpoint
51-67	Below Average: Needs improvement
Below 50	Failure: Serious usability problems

The magic number to remember: 68 is average. Anything above 71 is genuinely good. Below 50 indicates fundamental problems that will frustrate most users.

NPS Benchmarks

NPS is highly industry-dependent. A score of +30 might be excellent in one industry and mediocre in another.

SEQ Benchmarks

On a 7-point scale:

Average: ~5.5
Above 5.5: Task was perceived as easier than average
Below 4.5: Task was perceived as difficult; investigate further

For the statistical techniques to properly interpret instrument scores, see Quantitative Analysis: From Metrics to Significance.

Common Mistakes

Modifying Validated Instruments

"We only need 5 of the SUS questions." No. Validated instruments work as complete packages. Removing or rewording items invalidates the benchmarks.

Over-relying on Single Metrics

NPS alone does not tell you what to fix. SUS alone does not explain why users struggle. Quantitative metrics tell you that something is happening; qualitative data tells you why.

Ignoring Context

A SUS score of 72 might be excellent for complex enterprise software and mediocre for a consumer mobile app. Always consider the product category when interpreting scores.

Treating Scores as Precise

All measurements have uncertainty. A SUS score of 72 might really be "somewhere between 65 and 79." Report confidence intervals, especially with smaller samples.

What This Means for Practice

Standardized instruments are powerful tools when used correctly:

Choose the right instrument for what you actually need to measure
Administer them correctly, at the right time, with complete items, to enough participants
Interpret with benchmarks, raw scores are meaningless without context
Combine with qualitative data, metrics quantify problems; observation reveals them
Report uncertainty, acknowledge the precision limits of your sample size

The goal is not to chase a score. It is to use measurement as one input into better design decisions.

To calculate the sample size needed for reliable instrument scores, see the Sample Size Calculator.

UX Measurement Instruments: Scales, Scores, and What They Actually Measure

Summary

Which Tool for Which Layer?

Micro-Level: The Task

Meso-Level: The Product

Macro-Level: The Relationship

System Usability Scale (SUS)

What It Measures

Interpreting SUS Scores

Adjective Rating Scale (Bangor 2009)

Curved Grading Scale (Sauro & Lewis 2016)

Practical Considerations

Lighter SUS Alternatives

Net Promoter Score (NPS)

How It Works

What It Actually Measures

When to Use NPS

Product-Market Fit Score (The Sean Ellis Test)

The 40% Benchmark

Who Counts as a Respondent

What It Actually Measures

When to Use It

Single Ease Question (SEQ)

When to Use It

Interpretation

User Experience Questionnaire (UEQ)

When to Use It

Practical Notes

UEQ Family

Web-Specific: SUPR-Q

Choosing the Right Instrument

What Is a "Good" Score?

SUS Benchmarks

NPS Benchmarks

SEQ Benchmarks

Common Mistakes

Modifying Validated Instruments

Over-relying on Single Metrics

Ignoring Context

Treating Scores as Precise

What This Means for Practice

References

Free Research Handbook

Related Resources

Sample Size Calculator — Tool and Explanations

Quantitative Analysis: From Metrics to Significance

The Applied Research Framework: How Everything Fits Together

Ready to Take Action?

UX Measurement Instruments: Scales, Scores, and What They Actually Measure

Summary

Which Tool for Which Layer?

Micro-Level: The Task

Meso-Level: The Product

Macro-Level: The Relationship

System Usability Scale (SUS)

What It Measures

Interpreting SUS Scores

Adjective Rating Scale (Bangor 2009)

Curved Grading Scale (Sauro & Lewis 2016)

Practical Considerations

Lighter SUS Alternatives

Net Promoter Score (NPS)

How It Works

What It Actually Measures

When to Use NPS

Product-Market Fit Score (The Sean Ellis Test)

The 40% Benchmark

Who Counts as a Respondent

What It Actually Measures

When to Use It

Single Ease Question (SEQ)

When to Use It

Interpretation

User Experience Questionnaire (UEQ)

When to Use It

Practical Notes

UEQ Family

Web-Specific: SUPR-Q

Choosing the Right Instrument

What Is a "Good" Score?