AIG-029 Prompt Injection Protection

Tier 2+AI

Description

LLM-based systems that accept user-supplied or externally-sourced input have documented controls against prompt injection attacks. Controls include at minimum two of: input sanitisation and content filtering at the API layer, instruction/data separation in prompt architecture, privilege separation for tool-calling agents, output filtering for command patterns, sandboxing of downstream tool calls. Prompt injection is included in the threat model and evaluated during security testing. Injection attempt patterns are logged for detection and tuning.

Rationale

Prompt injection is a novel attack class specific to LLM-based systems; it allows adversarial input to override system instructions, exfiltrate data, or cause the model to take unauthorised actions. It is not covered by any existing framework at an operational level.

Framework Mappings (2)

EU AI Act 2024

EU-AI-Art.15.3

Accuracy, Robustness and Cybersecurity — Cybersecurity Against AI-Specific Attacks

informative

NIST AI RMF 1.0

MEASURE 2.7

AI System Security and Resilience Evaluation

informative

Evidence (2)

configurationautomated

Input validation and content filtering configuration for LLM systems, documenting at least two active prompt injection controls (input sanitisation, instruction/data separation, privilege separation, output filtering, tool-call sandboxing).

Example: LLM API gateway configuration (AWS Bedrock Guardrails export or LangChain input guard config): input_sanitisation: enabled (strips HTML/JS injection patterns), system_prompt_separation: instruction_prefix=SYSTEM_INSTRUCTION, data_prefix=USER_DATA, tool_call_sandbox: docker_isolated (no network access), injection_pattern_log: enabled, guardrail_version: v4

Test: Request the prompt injection protection configuration for each LLM-based system. Verify: (1) at least two controls from the list are active and configured (not just listed in a document), (2) input sanitisation patterns cover known injection signatures, (3) instruction/data separation is enforced in the prompt architecture (inspect system prompt template), (4) tool-call sandboxing is configured for agentic systems with tool access, (5) injection attempt logging is active (confirm log source exists).

reportmanual

Security test report including prompt injection test cases, demonstrating that the system was evaluated for prompt injection resistance during V&V testing.

Example: Security Test Report — AI Customer Support Bot v3 (Burp Suite / custom harness, 2025-12-10): 45 prompt injection test cases executed (direct injection, indirect injection via retrieved documents, instruction override attempts), 0 successful injections, 3 partial bypasses noted and mitigated, re-test passed 2025-12-20

Test: Request security test results covering prompt injection. Verify: (1) test cases include both direct prompt injection and indirect injection (via retrieved or tool-returned content), (2) test cases attempt instruction override and data exfiltration via injected prompts, (3) all successful injection attempts have been remediated with evidence of re-test, (4) injection attempt patterns discovered during testing have been added to the detection configuration.

Questions (2)

boolean

Do your LLM-based systems that accept user-supplied or externally-sourced input have documented controls against prompt injection attacks?

Net-new control: prompt injection is an attack class specific to LLM-based systems. Adversarial input can override system instructions, exfiltrate data, or cause the model to take unauthorised actions. It is not covered by conventional SAST/DAST or existing frameworks at an operational level.

multi

Which prompt injection protection controls are active in your LLM-based systems?

Input sanitisation and content filtering at the API layerInstruction/data separation enforced in the system prompt architecturePrivilege separation for tool-calling agentsOutput filtering for injected command patternsSandboxing of downstream tool calls triggered by the LLMPrompt injection test cases included in security testingInjection attempt patterns logged for detection and tuning

At least two controls should be active. For agentic systems with tool access (web browsing, code execution, database queries), privilege separation and tool-call sandboxing are critical — prompt injection in agentic systems can result in unauthorised real-world actions.

Search controls

AIG-029 Prompt Injection Protection

Framework Mappings (2)

Evidence (2)

Questions (2)