Automated Cognitive Walkthrough Process

The promise is seductive: why recruit 12 participants when you can simulate 1,000? Why wait for scheduling when an AI can "walk through" your prototype in seconds?

The answer is not as binary as it was two years ago.

Large Language Models and AI agents have opened new possibilities for research automation. Some are genuinely valuable. Others are methodological landmines. And between those two poles, a gray zone has emerged: synthetic data as a preparation tool for real research, not a replacement for it.

The question is not "synthetic or real?" It is: what is the synthetic data being used for? If it informs a business decision, you need real data. If it informs a research design, synthetic can be a useful tool. This guide maps the spectrum.

The Legitimate Use Case: Automated Cognitive Walkthrough

An AI agent can systematically navigate a prototype or live product, evaluating it against logical criteria. This is not fake research; it is super-powered heuristic evaluation.

What AI Agents Can Do

Capability	Example	Value
Logical flow analysis	"Step 3 references data not collected until Step 5"	Catches sequencing errors
Label consistency	"The button says 'Submit' here but 'Send' elsewhere"	Identifies confusing terminology
Navigation auditing	"This page has no way to return to the dashboard"	Finds dead ends
Accessibility scanning	"This image has no alt text; this form field has no label"	Flags WCAG violations
Content evaluation	"This error message doesn't explain how to fix the problem"	Improves microcopy

Why This Works

Logic is programmable. An AI can be given explicit rules:

"Every action should have a clear undo path"
"Every form field should have a visible label"
"Every error should explain the problem and suggest a fix"
"Navigation should be consistent across all pages"

The AI then systematically checks every screen against these rules, faster and more consistently than a human evaluator.

The Cognitive Walkthrough Protocol

Define evaluation criteria: What heuristics or standards should the AI check against?
Provide the interface: Screenshots, prototype links, or live URLs
Run the walkthrough: AI navigates and flags violations
Review findings: Human researcher validates and prioritizes
Fix and re-run: Iterate until baseline issues are resolved

What Automated Walkthroughs Cannot Do

Even in legitimate use cases, AI has limits:

Limitation	Example
Cannot assess emotional response	"Does this error message feel patronizing?"
Cannot evaluate trust	"Would you enter your credit card here?"
Cannot predict workarounds	"Users might screenshot this instead of using the share button"
Cannot surface unstated needs	"I wish this also showed me X"

These require real humans with real context.

For the traditional heuristic evaluation method closest to automated walkthroughs, see Heuristic Evaluation: The Audit Before the Test.

The Middle Ground: Synthetic Data as Research Preparation

Between system audits and human research lies a growing set of use cases where synthetic data is not the output but a tool for improving the research you will do with real people.

Instrument Stress-Testing

Generate synthetic responses to draft survey questions or interview guides to check for ambiguity, ceiling effects, floor effects, or insufficient scale differentiation. Run them before real participants touch the instrument. This is piloting, not data collection. A draft questionnaire that produces identical synthetic responses across different demographic prompts probably has a scale problem. Catching that before your fieldwork starts saves time and money.

Hypothesis Exploration

Use synthetic responses to explore: "If our product solves problem X for audience Y, what reaction patterns would be plausible?" This is a thinking tool for sharpening research questions and study design. Not a data source. Not evidence. A brainstorming partner that can process more combinations than a whiteboard session. The output is better hypotheses to test with real people, not conclusions.

Edge Case Brainstorming

AI can generate extreme usage scenarios (accessibility edge cases, unusual device contexts, atypical user goals) that the research team might not think of. Useful for making sure your study design covers enough ground. If your test plan only accounts for the happy path, synthetic edge cases can reveal the blind spots in your protocol.

This framing aligns with recent academic work. The Kuric et al. systematic review (^[1]) proposes treating synthetic participants as "heuristic-like": useful for quick checks, stress tests, and preparation, but not as substitutes for real participant data. The review also identifies valid "supplemental roles" and "augmentative approaches" where synthetic data supports rather than replaces human research. That maps directly to the middle column of our spectrum: synthetic data as a tool that makes your real research better, not a shortcut that eliminates it.

The Dangerous Use Case: Mimicking People

The temptation is to go further: if AI can evaluate a flow, can it also respond like a user? Can it generate survey responses, simulate interview answers, or create "synthetic personas" based on demographic profiles?

The core argument stands: AI cannot replicate lived experience, and synthetic responses are not user data. But the framing needs nuance. The danger is not in generating synthetic responses per se. It is in what you do with them.

Two teams can run the exact same synthetic data generation. One uses it to pilot-test their survey before real fieldwork. The other presents it to stakeholders as "user research." The first is legitimate methodology. The second is fabrication. The difference is the endpoint, not the technique.

The Fundamental Problem

Language models predict the probable next word based on training data. They do not model the true human reaction to your specific product.

What AI Does	What Research Needs
Predicts statistically likely response	Captures actual human reaction
Defaults to "average internet opinion"	Surfaces edge cases and outliers
Simulates plausibility	Reveals truth
Generates coherent text	Reflects lived experience

Why AI Cannot Simulate Humans

AI lacks lived experience. It has never:

Lost a job and felt the anxiety of checking a bank balance
Struggled to complete a form while a baby cried in the background
Felt the specific frustration of a promise broken by a brand
Experienced the trust that comes from years of positive interactions
Made an irrational choice because of a memory from childhood

These experiences shape how real users interact with products. AI can generate text that sounds like these experiences, but it is simulation, not observation.

Models in 2026 are better at simulating plausible human responses than those in 2024. This makes the problem harder, not easier. The more convincing the simulation, the harder it becomes for teams to recognize they are looking at generated text, not real human experience. Better simulation is not progress toward replacing real research. It is a higher-fidelity trap.

A 2025 systematic literature review by Kuric, Demcak, and Krajcovic (^[1]) analyzed 182 studies that attempted to use LLMs as synthetic participants. The review represents the most comprehensive evidence base on this question to date. It identifies four fundamental problems: cognitive misalignments between model outputs and human reasoning, systematic distortions in response distributions, misleading believability (outputs that read as human but carry no actual human signal), and overfitting to training data. Despite prompt engineering and modeling techniques designed to improve fidelity, improvements remain modest. At their most representative, the review concludes, LLMs stochastically parrot their training data. They do not generate novel human responses.

The "Average User" Trap

When you ask an AI to respond as "a 35-year-old working mother," it generates a statistically average representation based on how such people are described online. This has two problems:

Stereotyping: The AI reproduces cultural assumptions and biases
Flattening: Real humans are contradictory, surprising, and individual

The insights that matter most (the unexpected behaviors, the edge cases, the genuine confusion) are exactly what synthetic data cannot produce.

Specific Failures of Synthetic User Data

Method	What Goes Wrong
Synthetic survey responses	AI generates plausible-sounding but meaningless data; statistical analysis produces confident but false conclusions
Synthetic interviews	AI produces coherent narratives that confirm your assumptions; you learn nothing new
AI-generated personas	Stereotypes are reinforced; edge cases are invisible; design for the "average" that represents no one
Synthetic usability feedback	AI predicts what users might struggle with, missing what they actually struggle with

The Verdict: A Spectrum, Not a Line

The distinction is not binary. It is a spectrum with three zones:

Test the System	Prepare the Research	Understand the Human
Use synthetic data	Use synthetic data as a tool	Use real data
Is it logical?	Are my questions clear?	Is it desirable?
Is it consistent?	Does my study design cover enough ground?	Does it solve a real problem?
Is it accessible?	What edge cases should I plan for?	How does it feel?
Are there obvious errors?	Are my hypotheses worth testing?	What surprises us?

The Decision Framework

When Synthetic Methods Are Appropriate

Method	Appropriate Use
AI cognitive walkthrough	Pre-testing before human participants
Automated accessibility audit	Baseline compliance check
AI-assisted content review	Catching inconsistencies at scale
Synthetic load testing	Stress-testing system performance
Synthetic survey piloting	Checking instrument quality before fieldwork
AI hypothesis exploration	Sharpening research questions and study design

When Synthetic Methods Are Dangerous

Method	Why It Fails
Synthetic survey responses	Produces false confidence in meaningless data
AI-generated interview transcripts	Confirms assumptions, surfaces no surprises
Synthetic personas replacing real segmentation	Designs for stereotypes, not real people
AI "predicting" user behavior	Misses the irrational, emotional, contextual reality

The Ethical Dimension

Beyond methodology, there is an ethical question: synthetic user data can be used to fake research entirely.

A team under pressure could generate "1,000 survey responses" to justify a decision already made. A vendor could claim "user research" that was actually AI fabrication. A report could present synthetic quotes as real participant voices.

Disclosure alone is not enough. Organizations need explicit governance: what counts as research data and what does not. Who reviews whether synthetic methods were used appropriately. What happens when a vendor claims "user research" but the data is generated. The tools to fabricate research are now trivially accessible. The barrier is no longer technical; it is institutional. Teams that do not have clear policies on this will eventually face a credibility crisis, either internally or with clients. For how these shifts are reshaping research roles and responsibilities, see Career in the Age of AI: What Changes, What Remains.

Transparency Requirements

If you use AI in any part of your research process, disclose it:

"Accessibility issues were identified using automated scanning tools"
"Initial heuristic evaluation was AI-assisted; findings were validated by human reviewers"
"Prototype was pre-tested with automated walkthrough before participant sessions"
"Synthetic data was used to pilot-test the research instrument; all findings presented are from real participant sessions"

Never present AI-generated content as human participant data.

For the broader ethical framework for data handling in research, see Ethics and Data Privacy in UX Research.

What This Means for Practice

Synthetic data is a tool, powerful when used correctly, dangerous when misused.

Use AI for system testing: Automated walkthroughs, accessibility audits, and logical consistency checks are legitimate and valuable
Use AI to prepare better research: Instrument piloting, hypothesis exploration, and edge case brainstorming improve the quality of studies you run with real people
Never use AI to replace human participants: Survey responses, interview data, and behavioral observations require real people
Remember the limitation: AI simulates plausibility, not truth; it lacks lived experience
Disclose AI use: Transparency about methodology protects your credibility
Apply the three-zone test: "Am I testing the system, preparing the research, or trying to understand the human?"

The most sophisticated AI cannot tell you what it feels like to be your user. Only your users can do that.

For a comprehensive view of AI capabilities and limitations that contextualizes synthetic data risks, see What AI Can and Cannot Do for UX Research.

For the related challenge of AI replacing human moderators in live interviews, see AI-Moderated Interviews: The 'Rag Rug' Data Problem.

References

Kuric, E., Demcak, P., & Krajcovic, M. (2025). "Synthetic Participants Generated by Large Language Models: A Systematic Literature Review." Preprint, Research Square. DOI: 10.21203/rs.3.rs-9057643/v1. Note: This is a preprint and has not yet undergone peer review. The evidence base of 182 reviewed studies is substantial, but findings should be read with that caveat.

The promise is seductive: why recruit 12 participants when you can simulate 1,000? Why wait for scheduling when an AI can "walk through" your prototype in seconds?

The answer is not as binary as it was two years ago.

The Legitimate Use Case: Automated Cognitive Walkthrough

An AI agent can systematically navigate a prototype or live product, evaluating it against logical criteria. This is not fake research; it is super-powered heuristic evaluation.

What AI Agents Can Do

Capability	Example	Value
Logical flow analysis	"Step 3 references data not collected until Step 5"	Catches sequencing errors
Label consistency	"The button says 'Submit' here but 'Send' elsewhere"	Identifies confusing terminology
Navigation auditing	"This page has no way to return to the dashboard"	Finds dead ends
Accessibility scanning	"This image has no alt text; this form field has no label"	Flags WCAG violations
Content evaluation	"This error message doesn't explain how to fix the problem"	Improves microcopy

Why This Works

Logic is programmable. An AI can be given explicit rules:

"Every action should have a clear undo path"
"Every form field should have a visible label"
"Every error should explain the problem and suggest a fix"
"Navigation should be consistent across all pages"

The AI then systematically checks every screen against these rules, faster and more consistently than a human evaluator.

The Cognitive Walkthrough Protocol

Define evaluation criteria: What heuristics or standards should the AI check against?
Provide the interface: Screenshots, prototype links, or live URLs
Run the walkthrough: AI navigates and flags violations
Review findings: Human researcher validates and prioritizes
Fix and re-run: Iterate until baseline issues are resolved

What Automated Walkthroughs Cannot Do

Even in legitimate use cases, AI has limits:

Limitation	Example
Cannot assess emotional response	"Does this error message feel patronizing?"
Cannot evaluate trust	"Would you enter your credit card here?"
Cannot predict workarounds	"Users might screenshot this instead of using the share button"
Cannot surface unstated needs	"I wish this also showed me X"

These require real humans with real context.

For the traditional heuristic evaluation method closest to automated walkthroughs, see Heuristic Evaluation: The Audit Before the Test.

The Middle Ground: Synthetic Data as Research Preparation

Between system audits and human research lies a growing set of use cases where synthetic data is not the output but a tool for improving the research you will do with real people.

Instrument Stress-Testing

Hypothesis Exploration

Edge Case Brainstorming

The Dangerous Use Case: Mimicking People

The Fundamental Problem

Language models predict the probable next word based on training data. They do not model the true human reaction to your specific product.

What AI Does	What Research Needs
Predicts statistically likely response	Captures actual human reaction
Defaults to "average internet opinion"	Surfaces edge cases and outliers
Simulates plausibility	Reveals truth
Generates coherent text	Reflects lived experience

Why AI Cannot Simulate Humans

AI lacks lived experience. It has never:

Lost a job and felt the anxiety of checking a bank balance
Struggled to complete a form while a baby cried in the background
Felt the specific frustration of a promise broken by a brand
Experienced the trust that comes from years of positive interactions
Made an irrational choice because of a memory from childhood

These experiences shape how real users interact with products. AI can generate text that sounds like these experiences, but it is simulation, not observation.

The "Average User" Trap

When you ask an AI to respond as "a 35-year-old working mother," it generates a statistically average representation based on how such people are described online. This has two problems:

Stereotyping: The AI reproduces cultural assumptions and biases
Flattening: Real humans are contradictory, surprising, and individual

The insights that matter most (the unexpected behaviors, the edge cases, the genuine confusion) are exactly what synthetic data cannot produce.

Specific Failures of Synthetic User Data

Method	What Goes Wrong
Synthetic survey responses	AI generates plausible-sounding but meaningless data; statistical analysis produces confident but false conclusions
Synthetic interviews	AI produces coherent narratives that confirm your assumptions; you learn nothing new
AI-generated personas	Stereotypes are reinforced; edge cases are invisible; design for the "average" that represents no one
Synthetic usability feedback	AI predicts what users might struggle with, missing what they actually struggle with

The Verdict: A Spectrum, Not a Line

The distinction is not binary. It is a spectrum with three zones:

Test the System	Prepare the Research	Understand the Human
Use synthetic data	Use synthetic data as a tool	Use real data
Is it logical?	Are my questions clear?	Is it desirable?
Is it consistent?	Does my study design cover enough ground?	Does it solve a real problem?
Is it accessible?	What edge cases should I plan for?	How does it feel?
Are there obvious errors?	Are my hypotheses worth testing?	What surprises us?

The Decision Framework

When Synthetic Methods Are Appropriate

Method	Appropriate Use
AI cognitive walkthrough	Pre-testing before human participants
Automated accessibility audit	Baseline compliance check
AI-assisted content review	Catching inconsistencies at scale
Synthetic load testing	Stress-testing system performance
Synthetic survey piloting	Checking instrument quality before fieldwork
AI hypothesis exploration	Sharpening research questions and study design

When Synthetic Methods Are Dangerous

Method	Why It Fails
Synthetic survey responses	Produces false confidence in meaningless data
AI-generated interview transcripts	Confirms assumptions, surfaces no surprises
Synthetic personas replacing real segmentation	Designs for stereotypes, not real people
AI "predicting" user behavior	Misses the irrational, emotional, contextual reality

The Ethical Dimension

Beyond methodology, there is an ethical question: synthetic user data can be used to fake research entirely.

Transparency Requirements

If you use AI in any part of your research process, disclose it:

"Accessibility issues were identified using automated scanning tools"
"Initial heuristic evaluation was AI-assisted; findings were validated by human reviewers"
"Prototype was pre-tested with automated walkthrough before participant sessions"
"Synthetic data was used to pilot-test the research instrument; all findings presented are from real participant sessions"

Never present AI-generated content as human participant data.

For the broader ethical framework for data handling in research, see Ethics and Data Privacy in UX Research.

What This Means for Practice

Synthetic data is a tool, powerful when used correctly, dangerous when misused.

Use AI for system testing: Automated walkthroughs, accessibility audits, and logical consistency checks are legitimate and valuable
Use AI to prepare better research: Instrument piloting, hypothesis exploration, and edge case brainstorming improve the quality of studies you run with real people
Never use AI to replace human participants: Survey responses, interview data, and behavioral observations require real people
Remember the limitation: AI simulates plausibility, not truth; it lacks lived experience
Disclose AI use: Transparency about methodology protects your credibility
Apply the three-zone test: "Am I testing the system, preparing the research, or trying to understand the human?"

The most sophisticated AI cannot tell you what it feels like to be your user. Only your users can do that.

For a comprehensive view of AI capabilities and limitations that contextualizes synthetic data risks, see What AI Can and Cannot Do for UX Research.

For the related challenge of AI replacing human moderators in live interviews, see AI-Moderated Interviews: The 'Rag Rug' Data Problem.

References

Kuric, E., Demcak, P., & Krajcovic, M. (2025). "Synthetic Participants Generated by Large Language Models: A Systematic Literature Review." Preprint, Research Square. DOI: 10.21203/rs.3.rs-9057643/v1. Note: This is a preprint and has not yet undergone peer review. The evidence base of 182 reviewed studies is substantial, but findings should be read with that caveat.

Synthetic Research Data: Automated Walkthroughs vs. Fake Users

Summary

The Legitimate Use Case: Automated Cognitive Walkthrough

What AI Agents Can Do

Why This Works

The Cognitive Walkthrough Protocol

What Automated Walkthroughs Cannot Do

The Middle Ground: Synthetic Data as Research Preparation

Instrument Stress-Testing

Hypothesis Exploration

Edge Case Brainstorming

The Dangerous Use Case: Mimicking People

The Fundamental Problem

Why AI Cannot Simulate Humans

The "Average User" Trap

Specific Failures of Synthetic User Data

The Verdict: A Spectrum, Not a Line

The Decision Framework

When Synthetic Methods Are Appropriate

When Synthetic Methods Are Dangerous

The Ethical Dimension

Transparency Requirements

What This Means for Practice

References

Free Research Handbook

Related Resources

AI-Moderated Interviews: The 'Rag Rug' Data Problem

Evaluating AI Research Tools: A Durable Framework

AI-Assisted Thematic Analysis: A Practical Workflow

Ready to Take Action?

Synthetic Research Data: Automated Walkthroughs vs. Fake Users

Summary

The Legitimate Use Case: Automated Cognitive Walkthrough

What AI Agents Can Do

Why This Works

The Cognitive Walkthrough Protocol

What Automated Walkthroughs Cannot Do

The Middle Ground: Synthetic Data as Research Preparation

Instrument Stress-Testing

Hypothesis Exploration

Edge Case Brainstorming

The Dangerous Use Case: Mimicking People

The Fundamental Problem

Why AI Cannot Simulate Humans

The "Average User" Trap

Specific Failures of Synthetic User Data

The Verdict: A Spectrum, Not a Line

The Decision Framework

When Synthetic Methods Are Appropriate

When Synthetic Methods Are Dangerous

The Ethical Dimension

Transparency Requirements

What This Means for Practice

References

Free Research Handbook

Related Resources

AI-Moderated Interviews: The 'Rag Rug' Data Problem

Evaluating AI Research Tools: A Durable Framework

AI-Assisted Thematic Analysis: A Practical Workflow

Ready to Take Action?