AIG-029 Prompt Injection Protection
Description
LLM-based systems that accept user-supplied or externally-sourced input have documented controls against prompt injection attacks. Controls include at minimum two of: input sanitisation and content filtering at the API layer, instruction/data separation in prompt architecture, privilege separation for tool-calling agents, output filtering for command patterns, sandboxing of downstream tool calls. Prompt injection is included in the threat model and evaluated during security testing. Injection attempt patterns are logged for detection and tuning.
Rationale
Prompt injection is a novel attack class specific to LLM-based systems; it allows adversarial input to override system instructions, exfiltrate data, or cause the model to take unauthorised actions. It is not covered by any existing framework at an operational level.
Framework Mappings (2)
| EU-AI-Art.15.3 | Accuracy, Robustness and Cybersecurity — Cybersecurity Against AI-Specific Attacks | informative |
| MEASURE 2.7 | AI System Security and Resilience Evaluation | informative |
Evidence (2)
Input validation and content filtering configuration for LLM systems, documenting at least two active prompt injection controls (input sanitisation, instruction/data separation, privilege separation, output filtering, tool-call sandboxing).
Example: LLM API gateway configuration (AWS Bedrock Guardrails export or LangChain input guard config): input_sanitisation: enabled (strips HTML/JS injection patterns), system_prompt_separation: instruction_prefix=SYSTEM_INSTRUCTION, data_prefix=USER_DATA, tool_call_sandbox: docker_isolated (no network access), injection_pattern_log: enabled, guardrail_version: v4
Test: Request the prompt injection protection configuration for each LLM-based system. Verify: (1) at least two controls from the list are active and configured (not just listed in a document), (2) input sanitisation patterns cover known injection signatures, (3) instruction/data separation is enforced in the prompt architecture (inspect system prompt template), (4) tool-call sandboxing is configured for agentic systems with tool access, (5) injection attempt logging is active (confirm log source exists).
Security test report including prompt injection test cases, demonstrating that the system was evaluated for prompt injection resistance during V&V testing.
Example: Security Test Report — AI Customer Support Bot v3 (Burp Suite / custom harness, 2025-12-10): 45 prompt injection test cases executed (direct injection, indirect injection via retrieved documents, instruction override attempts), 0 successful injections, 3 partial bypasses noted and mitigated, re-test passed 2025-12-20
Test: Request security test results covering prompt injection. Verify: (1) test cases include both direct prompt injection and indirect injection (via retrieved or tool-returned content), (2) test cases attempt instruction override and data exfiltration via injected prompts, (3) all successful injection attempts have been remediated with evidence of re-test, (4) injection attempt patterns discovered during testing have been added to the detection configuration.
Questions (2)
Do your LLM-based systems that accept user-supplied or externally-sourced input have documented controls against prompt injection attacks?
Net-new control: prompt injection is an attack class specific to LLM-based systems. Adversarial input can override system instructions, exfiltrate data, or cause the model to take unauthorised actions. It is not covered by conventional SAST/DAST or existing frameworks at an operational level.
Which prompt injection protection controls are active in your LLM-based systems?
At least two controls should be active. For agentic systems with tool access (web browsing, code execution, database queries), privilege separation and tool-call sandboxing are critical — prompt injection in agentic systems can result in unauthorised real-world actions.