Skip to content
UPCOMING EVENTS:UX, Product & Market Research Afterwork23. Apr.@Packhaus WienDetailsInsights & Research Breakfast16. Mai@Packhaus WienDetailsVibecoding & Agentic Coding for App Development22. Mai@Packhaus WienDetails
UPCOMING EVENTS:UX, Product & Market Research Afterwork23. Apr.@Packhaus WienDetailsInsights & Research Breakfast16. Mai@Packhaus WienDetailsVibecoding & Agentic Coding for App Development22. Mai@Packhaus WienDetails

Synthetic Research Data: Automated Walkthroughs vs. Fake Users

AI agents can simulate a logical user journey, but they cannot simulate the messiness of human behavior. Where to draw the line between useful stress-testing and dangerous fabrication.

Marc Busch
Updated May 1, 2024
8 min read

Summary

Synthetic data has legitimate uses in research—AI agents excel at automated cognitive walkthroughs, spotting logical inconsistencies, broken flows, and accessibility violations. However, using AI to generate 'synthetic users' or fake survey responses is methodologically dangerous. AI lacks lived experience; it simulates plausibility, not truth. The rule: use synthetic data to test the system (logic, accessibility), use real data to understand the human (behavior, emotion, motivation).

The promise is seductive: why recruit 12 participants when you can simulate 1,000? Why wait for scheduling when an AI can "walk through" your prototype in seconds?

The answer depends entirely on what question you are trying to answer.

and AI agents have opened new possibilities for research automation. Some are genuinely valuable. Others are methodological landmines. This guide helps you tell the difference.

The Legitimate Use Case: Automated Cognitive Walkthrough

An AI agent can systematically navigate a prototype or live product, evaluating it against logical criteria. This is not fake research—it is super-powered .

What AI Agents Can Do

CapabilityExampleValue
Logical flow analysis"Step 3 references data not collected until Step 5"Catches sequencing errors
Label consistency"The button says 'Submit' here but 'Send' elsewhere"Identifies confusing terminology
Navigation auditing"This page has no way to return to the dashboard"Finds dead ends
Accessibility scanning"This image has no alt text; this form field has no label"Flags WCAG violations
Content evaluation"This error message doesn't explain how to fix the problem"Improves microcopy

Why This Works

Logic is programmable. An AI can be given explicit rules:

  • "Every action should have a clear undo path"
  • "Every form field should have a visible label"
  • "Every error should explain the problem and suggest a fix"
  • "Navigation should be consistent across all pages"

The AI then systematically checks every screen against these rules, faster and more consistently than a human evaluator.

The Cognitive Walkthrough Protocol

  1. Define evaluation criteria: What heuristics or standards should the AI check against?
  2. Provide the interface: Screenshots, prototype links, or live URLs
  3. Run the walkthrough: AI navigates and flags violations
  4. Review findings: Human researcher validates and prioritizes
  5. Fix and re-run: Iterate until baseline issues are resolved
Automated Cognitive Walkthrough ProcessA process diagram showing the automated cognitive walkthrough: Input of prototype and heuristic checklist flows into an AI agent that evaluates logical consistency, navigation completeness, label clarity, and accessibility compliance, producing a list of violations and locations as output. The human role is to validate, prioritize, and decide.AUTOMATED WALKTHROUGHINPUTPrototype + Heuristic ChecklistAI Agent EvaluatesLogical consistencyNavigation completenessLabel clarityAccessibility complianceOUTPUTList of violations + locationsHUMAN ROLEValidatePrioritizeDecide

What Automated Walkthroughs Cannot Do

Even in legitimate use cases, AI has limits:

LimitationExample
Cannot assess emotional response"Does this error message feel patronizing?"
Cannot evaluate trust"Would you enter your credit card here?"
Cannot predict workarounds"Users might screenshot this instead of using the share button"
Cannot surface unstated needs"I wish this also showed me X"

These require real humans with real context.

The Dangerous Use Case: Mimicking People

The temptation is to go further: if AI can evaluate a flow, can it also respond like a user? Can it generate survey responses, simulate interview answers, or create "synthetic personas" based on demographic profiles?

This is where synthetic data becomes dangerous.

The Fundamental Problem

predict the probable next word based on training data. They do not model the true human reaction to your specific product.

What AI DoesWhat Research Needs
Predicts statistically likely responseCaptures actual human reaction
Defaults to "average internet opinion"Surfaces edge cases and outliers
Simulates plausibilityReveals truth
Generates coherent textReflects lived experience

Why AI Cannot Simulate Humans

AI lacks lived experience. It has never:

  • Lost a job and felt the anxiety of checking a bank balance
  • Struggled to complete a form while a baby cried in the background
  • Felt the specific frustration of a promise broken by a brand
  • Experienced the trust that comes from years of positive interactions
  • Made an irrational choice because of a memory from childhood

These experiences shape how real users interact with products. AI can generate text that sounds like these experiences, but it is simulation, not observation.

The "Average User" Trap

When you ask an AI to respond as "a 35-year-old working mother," it generates a statistically average representation based on how such people are described online. This has two problems:

  1. Stereotyping: The AI reproduces cultural assumptions and biases
  2. Flattening: Real humans are contradictory, surprising, and individual

The insights that matter most—the unexpected behaviors, the edge cases, the genuine confusion—are exactly what synthetic data cannot produce.

Specific Failures of Synthetic User Data

MethodWhat Goes Wrong
Synthetic survey responsesAI generates plausible-sounding but meaningless data; statistical analysis produces confident but false conclusions
Synthetic interviewsAI produces coherent narratives that confirm your assumptions; you learn nothing new
AI-generated personasStereotypes are reinforced; edge cases are invisible; design for the "average" that represents no one
Synthetic usability feedbackAI predicts what users might struggle with, missing what they actually struggle with

The Verdict: A Clear Line

The distinction is simple:

Test the SystemUnderstand the Human
Use synthetic dataUse real data
Is it logical?Is it desirable?
Is it consistent?Does it solve a problem?
Is it accessible?How does it feel?
Are there obvious errors?What surprises us?

The Decision Framework

Synthetic vs Real Data Decision FrameworkA decision tree starting with the question What are you trying to learn, branching left into System Properties where synthetic data is acceptable for AI walkthroughs, automated audits, heuristic evaluation, and accessibility scans, and branching right into Human Properties where real data is required for usability testing, interviews, surveys, and contextual inquiry.What are you trying to learn?SYSTEM PROPERTIESLogical consistencyTechnical functionAccessibilityHeuristic complianceHUMAN PROPERTIESEmotional responseTrust and credibilityReal-world contextUnmet needs / Actual behaviorSYNTHETIC DATA OKAI walkthroughsAutomated auditsHeuristic evaluationAccessibility scansREAL DATA REQUIREDUsability testingInterviewsSurveysContextual inquiry

When Synthetic Methods Are Appropriate

MethodAppropriate Use
AI cognitive walkthroughPre-testing before human participants
Automated accessibility auditBaseline compliance check
AI-assisted content reviewCatching inconsistencies at scale
Synthetic load testingStress-testing system performance

When Synthetic Methods Are Dangerous

MethodWhy It Fails
Synthetic survey responsesProduces false confidence in meaningless data
AI-generated interview transcriptsConfirms assumptions, surfaces no surprises
Synthetic personas replacing real segmentationDesigns for stereotypes, not real people
AI "predicting" user behaviorMisses the irrational, emotional, contextual reality

The Ethical Dimension

Beyond methodology, there is an ethical question: synthetic user data can be used to fake research entirely.

A team under pressure could generate "1,000 survey responses" to justify a decision already made. A vendor could claim "user research" that was actually AI fabrication. A report could present synthetic quotes as real participant voices.

Transparency Requirements

If you use AI in any part of your research process, disclose it:

  • "Accessibility issues were identified using automated scanning tools"
  • "Initial heuristic evaluation was AI-assisted; findings were validated by human reviewers"
  • "Prototype was pre-tested with automated walkthrough before participant sessions"

Never present AI-generated content as human participant data.

What This Means for Practice

Synthetic data is a tool—powerful when used correctly, dangerous when misused.

  1. Use AI for system testing: Automated walkthroughs, accessibility audits, and logical consistency checks are legitimate and valuable
  2. Never use AI to replace human participants: Survey responses, interview data, and behavioral observations require real people
  3. Remember the limitation: AI simulates plausibility, not truth; it lacks lived experience
  4. Disclose AI use: Transparency about methodology protects your credibility
  5. Apply the test: "Am I testing the system or understanding the human?"

The most sophisticated AI cannot tell you what it feels like to be your user. Only your users can do that.

READY TO TAKE ACTION?

Let's discuss how these insights can drive your business forward.

Synthetic Research Data: Automated Walkthroughs vs. Fake Users | Busch Labs | Busch Labs