An attack class where untrusted input carries instructions that hijack a model's behavior. Any AI system ingesting external content (transcripts, tickets, scraped pages) is a security surface.
Definition: An attack class where untrusted input carries instructions that hijack a model's behavior. Any AI system ingesting external content (transcripts, tickets, scraped pages) is a security surface.
A class of attack where untrusted input contains instructions that hijack the model's behavior. If your AI tool ingests participant transcripts, support tickets, or anything written by someone other than you, that text can carry instructions you did not write. Treat any AI system that touches external content as a security surface, not just a productivity tool.
The text input you send to a language model. Most 'AI doesn't work' complaints trace back to prompt quality before they trace to model quality.
Instruction given to an LLM that sets its role, behavior, or output format before any user input arrives. Acts as the model's standing operating instructions for the rest of a conversation. Wrapper tools usually contain a hidden system prompt you cannot see.
A vendor policy guaranteeing your inputs and outputs are not stored, logged, or used to train future models beyond the immediate request. The non-negotiable baseline for research with participant data.