Summary
Synthetic data in research is not a binary choice between good and bad. It exists on a spectrum with three zones: system testing (walkthroughs, accessibility audits) where synthetic is fully legitimate; research preparation (instrument piloting, hypothesis exploration) where synthetic serves as a tool but not a data source; and understanding humans (behavior, emotion, motivation) where only real data will do. The line is not 'synthetic vs. real' but 'what is the synthetic data being used for?' If it informs a business decision, you need real data. If it informs a research design, synthetic can be a useful tool.
The promise is seductive: why recruit 12 participants when you can simulate 1,000? Why wait for scheduling when an AI can "walk through" your prototype in seconds?
The answer is not as binary as it was two years ago.
Large Language Models and AI agents have opened new possibilities for research automation. Some are genuinely valuable. Others are methodological landmines. And between those two poles, a gray zone has emerged: synthetic data as a preparation tool for real research, not a replacement for it.
The question is not "synthetic or real?" It is: what is the synthetic data being used for? If it informs a business decision, you need real data. If it informs a research design, synthetic can be a useful tool. This guide maps the spectrum.
The Legitimate Use Case: Automated Cognitive Walkthrough
An AI agent can systematically navigate a prototype or live product, evaluating it against logical criteria. This is not fake research; it is super-powered heuristic evaluation.
What AI Agents Can Do
| Capability | Example | Value |
|---|---|---|
| Logical flow analysis | "Step 3 references data not collected until Step 5" | Catches sequencing errors |
| Label consistency | "The button says 'Submit' here but 'Send' elsewhere" | Identifies confusing terminology |
| Navigation auditing | "This page has no way to return to the dashboard" | Finds dead ends |
| Accessibility scanning | "This image has no alt text; this form field has no label" | Flags WCAG violations |
| Content evaluation | "This error message doesn't explain how to fix the problem" | Improves microcopy |
Why This Works
Logic is programmable. An AI can be given explicit rules:
- "Every action should have a clear undo path"
- "Every form field should have a visible label"
- "Every error should explain the problem and suggest a fix"
- "Navigation should be consistent across all pages"
The AI then systematically checks every screen against these rules, faster and more consistently than a human evaluator.
The Cognitive Walkthrough Protocol
- Define evaluation criteria: What heuristics or standards should the AI check against?
- Provide the interface: Screenshots, prototype links, or live URLs
- Run the walkthrough: AI navigates and flags violations
- Review findings: Human researcher validates and prioritizes
- Fix and re-run: Iterate until baseline issues are resolved
What Automated Walkthroughs Cannot Do
Even in legitimate use cases, AI has limits:
| Limitation | Example |
|---|---|
| Cannot assess emotional response | "Does this error message feel patronizing?" |
| Cannot evaluate trust | "Would you enter your credit card here?" |
| Cannot predict workarounds | "Users might screenshot this instead of using the share button" |
| Cannot surface unstated needs | "I wish this also showed me X" |
These require real humans with real context.
For the traditional heuristic evaluation method closest to automated walkthroughs, see Heuristic Evaluation: The Audit Before the Test.
The Middle Ground: Synthetic Data as Research Preparation
Between system audits and human research lies a growing set of use cases where synthetic data is not the output but a tool for improving the research you will do with real people.
Instrument Stress-Testing
Generate synthetic responses to draft survey questions or interview guides to check for ambiguity, ceiling effects, floor effects, or insufficient scale differentiation. Run them before real participants touch the instrument. This is piloting, not data collection. A draft questionnaire that produces identical synthetic responses across different demographic prompts probably has a scale problem. Catching that before your fieldwork starts saves time and money.
Hypothesis Exploration
Use synthetic responses to explore: "If our product solves problem X for audience Y, what reaction patterns would be plausible?" This is a thinking tool for sharpening research questions and study design. Not a data source. Not evidence. A brainstorming partner that can process more combinations than a whiteboard session. The output is better hypotheses to test with real people, not conclusions.
Edge Case Brainstorming
AI can generate extreme usage scenarios (accessibility edge cases, unusual device contexts, atypical user goals) that the research team might not think of. Useful for making sure your study design covers enough ground. If your test plan only accounts for the happy path, synthetic edge cases can reveal the blind spots in your protocol.
This framing aligns with recent academic work. The Kuric et al. systematic review ([1]) proposes treating synthetic participants as "heuristic-like": useful for quick checks, stress tests, and preparation, but not as substitutes for real participant data. The review also identifies valid "supplemental roles" and "augmentative approaches" where synthetic data supports rather than replaces human research. That maps directly to the middle column of our spectrum: synthetic data as a tool that makes your real research better, not a shortcut that eliminates it.
The Dangerous Use Case: Mimicking People
The temptation is to go further: if AI can evaluate a flow, can it also respond like a user? Can it generate survey responses, simulate interview answers, or create "synthetic personas" based on demographic profiles?
The core argument stands: AI cannot replicate lived experience, and synthetic responses are not user data. But the framing needs nuance. The danger is not in generating synthetic responses per se. It is in what you do with them.
Two teams can run the exact same synthetic data generation. One uses it to pilot-test their survey before real fieldwork. The other presents it to stakeholders as "user research." The first is legitimate methodology. The second is fabrication. The difference is the endpoint, not the technique.
The Fundamental Problem
Language models predict the probable next word based on training data. They do not model the true human reaction to your specific product.
| What AI Does | What Research Needs |
|---|---|
| Predicts statistically likely response | Captures actual human reaction |
| Defaults to "average internet opinion" | Surfaces edge cases and outliers |
| Simulates plausibility | Reveals truth |
| Generates coherent text | Reflects lived experience |
Why AI Cannot Simulate Humans
AI lacks lived experience. It has never:
- Lost a job and felt the anxiety of checking a bank balance
- Struggled to complete a form while a baby cried in the background
- Felt the specific frustration of a promise broken by a brand
- Experienced the trust that comes from years of positive interactions
- Made an irrational choice because of a memory from childhood
These experiences shape how real users interact with products. AI can generate text that sounds like these experiences, but it is simulation, not observation.
Models in 2026 are better at simulating plausible human responses than those in 2024. This makes the problem harder, not easier. The more convincing the simulation, the harder it becomes for teams to recognize they are looking at generated text, not real human experience. Better simulation is not progress toward replacing real research. It is a higher-fidelity trap.
A 2025 systematic literature review by Kuric, Demcak, and Krajcovic ([1]) analyzed 182 studies that attempted to use LLMs as synthetic participants. The review represents the most comprehensive evidence base on this question to date. It identifies four fundamental problems: cognitive misalignments between model outputs and human reasoning, systematic distortions in response distributions, misleading believability (outputs that read as human but carry no actual human signal), and overfitting to training data. Despite prompt engineering and modeling techniques designed to improve fidelity, improvements remain modest. At their most representative, the review concludes, LLMs stochastically parrot their training data. They do not generate novel human responses.
The "Average User" Trap
When you ask an AI to respond as "a 35-year-old working mother," it generates a statistically average representation based on how such people are described online. This has two problems:
- Stereotyping: The AI reproduces cultural assumptions and biases
- Flattening: Real humans are contradictory, surprising, and individual
The insights that matter most (the unexpected behaviors, the edge cases, the genuine confusion) are exactly what synthetic data cannot produce.
Specific Failures of Synthetic User Data
| Method | What Goes Wrong |
|---|---|
| Synthetic survey responses | AI generates plausible-sounding but meaningless data; statistical analysis produces confident but false conclusions |
| Synthetic interviews | AI produces coherent narratives that confirm your assumptions; you learn nothing new |
| AI-generated personas | Stereotypes are reinforced; edge cases are invisible; design for the "average" that represents no one |
| Synthetic usability feedback | AI predicts what users might struggle with, missing what they actually struggle with |
The Verdict: A Spectrum, Not a Line
The distinction is not binary. It is a spectrum with three zones:
| Test the System | Prepare the Research | Understand the Human |
|---|---|---|
| Use synthetic data | Use synthetic data as a tool | Use real data |
| Is it logical? | Are my questions clear? | Is it desirable? |
| Is it consistent? | Does my study design cover enough ground? | Does it solve a real problem? |
| Is it accessible? | What edge cases should I plan for? | How does it feel? |
| Are there obvious errors? | Are my hypotheses worth testing? | What surprises us? |
The Decision Framework
When Synthetic Methods Are Appropriate
| Method | Appropriate Use |
|---|---|
| AI cognitive walkthrough | Pre-testing before human participants |
| Automated accessibility audit | Baseline compliance check |
| AI-assisted content review | Catching inconsistencies at scale |
| Synthetic load testing | Stress-testing system performance |
| Synthetic survey piloting | Checking instrument quality before fieldwork |
| AI hypothesis exploration | Sharpening research questions and study design |
When Synthetic Methods Are Dangerous
| Method | Why It Fails |
|---|---|
| Synthetic survey responses | Produces false confidence in meaningless data |
| AI-generated interview transcripts | Confirms assumptions, surfaces no surprises |
| Synthetic personas replacing real segmentation | Designs for stereotypes, not real people |
| AI "predicting" user behavior | Misses the irrational, emotional, contextual reality |
The Ethical Dimension
Beyond methodology, there is an ethical question: synthetic user data can be used to fake research entirely.
A team under pressure could generate "1,000 survey responses" to justify a decision already made. A vendor could claim "user research" that was actually AI fabrication. A report could present synthetic quotes as real participant voices.
Disclosure alone is not enough. Organizations need explicit governance: what counts as research data and what does not. Who reviews whether synthetic methods were used appropriately. What happens when a vendor claims "user research" but the data is generated. The tools to fabricate research are now trivially accessible. The barrier is no longer technical; it is institutional. Teams that do not have clear policies on this will eventually face a credibility crisis, either internally or with clients. For how these shifts are reshaping research roles and responsibilities, see Career in the Age of AI: What Changes, What Remains.
Transparency Requirements
If you use AI in any part of your research process, disclose it:
- "Accessibility issues were identified using automated scanning tools"
- "Initial heuristic evaluation was AI-assisted; findings were validated by human reviewers"
- "Prototype was pre-tested with automated walkthrough before participant sessions"
- "Synthetic data was used to pilot-test the research instrument; all findings presented are from real participant sessions"
Never present AI-generated content as human participant data.
For the broader ethical framework for data handling in research, see Ethics and Data Privacy in UX Research.
What This Means for Practice
Synthetic data is a tool, powerful when used correctly, dangerous when misused.
- Use AI for system testing: Automated walkthroughs, accessibility audits, and logical consistency checks are legitimate and valuable
- Use AI to prepare better research: Instrument piloting, hypothesis exploration, and edge case brainstorming improve the quality of studies you run with real people
- Never use AI to replace human participants: Survey responses, interview data, and behavioral observations require real people
- Remember the limitation: AI simulates plausibility, not truth; it lacks lived experience
- Disclose AI use: Transparency about methodology protects your credibility
- Apply the three-zone test: "Am I testing the system, preparing the research, or trying to understand the human?"
The most sophisticated AI cannot tell you what it feels like to be your user. Only your users can do that.
For a comprehensive view of AI capabilities and limitations that contextualizes synthetic data risks, see What AI Can and Cannot Do for UX Research.
For the related challenge of AI replacing human moderators in live interviews, see AI-Moderated Interviews: The 'Rag Rug' Data Problem.
References
- Kuric, E., Demcak, P., & Krajcovic, M. (2025). "Synthetic Participants Generated by Large Language Models: A Systematic Literature Review." Preprint, Research Square. DOI: 10.21203/rs.3.rs-9057643/v1. Note: This is a preprint and has not yet undergone peer review. The evidence base of 182 reviewed studies is substantial, but findings should be read with that caveat.