AIG-035 Training Data Memorisation and Extraction Controls

Tier 2+AI

Description

AI systems — particularly LLMs — that may have memorised training data are evaluated for training data extraction risk before deployment and after retraining. Evaluation includes: membership inference testing, direct extraction probing for known sensitive training data, and review of system prompt configurations that could facilitate extraction. Systems that score above a documented risk threshold for extraction implement mitigations: output filtering, differential privacy in training, or constrained output length. The risk evaluation methodology and results are documented.

Rationale

LLMs can reproduce verbatim training data including PII, credentials, and copyrighted content when prompted — this is an AI-specific data leakage vector with no equivalent in conventional data security and is not addressed by any existing framework at an operational level.

Framework Mappings (3)

EU AI Act 2024

EU-AI-Art.15.3

Accuracy, Robustness and Cybersecurity — Cybersecurity Against AI-Specific Attacks

informative

NIST AI RMF 1.0

MEASURE 2.10	AI Privacy Risk Examination	informative
MEASURE 2.7	AI System Security and Resilience Evaluation	informative

Evidence (2)

reportmanual

Training data extraction risk evaluation report covering membership inference testing, direct extraction probing for sensitive training data, and system prompt configuration review, produced before deployment and after retraining.

Example: Training Data Extraction Risk Report — LLM Customer Support Model v2 (Confluence, 2025-11-20): membership inference test (500 member/non-member pairs, AUC 0.53 — near random, PASS), extraction probing (200 prompts targeting known training data patterns, 0 verbatim extractions > 50 tokens), system prompt injection review: PASS; risk score: LOW; no additional mitigations required

Test: Request the training data extraction risk evaluation report for each production LLM. Verify: (1) membership inference testing was performed with a documented methodology and the AUC result is recorded and compared to a defined threshold, (2) direct extraction probing was performed for known sensitive training data categories (PII, credentials, copyrighted content), (3) system prompt configuration was reviewed for extraction facilitation risks, (4) report is dated before initial deployment and after any retraining event, (5) where risk score exceeds the documented threshold, at least one mitigation (output filtering, differential privacy, constrained output length) is implemented and evidenced.

configurationautomated

Output filtering or differential privacy configuration applied to LLMs scoring above the training data extraction risk threshold, demonstrating that mitigations are technically enforced.

Example: LLM output filtering configuration (AWS Bedrock Guardrails or custom post-processing pipeline): max_verbatim_output_tokens: 150, PII_redaction: enabled (regex + NER model), known_credential_pattern_filter: enabled, exact_match_training_data_filter: enabled (hash-based bloom filter against known sensitive training records), filter_version: v2 (git tag output-filter-v2)

Test: For any LLM that scored above the extraction risk threshold in its evaluation report, request the mitigation configuration. Verify: (1) output length constraint is configured and enforced (test with a verbatim reproduction prompt), (2) PII redaction is active and covers the PII categories present in the training data, (3) known credential or sensitive pattern filters are configured, (4) configuration is version-controlled and the review date postdates the most recent extraction risk evaluation, (5) mitigation effectiveness is validated in the evaluation report (post-mitigation re-test result is present).

Questions (2)

boolean

Are your LLM-based AI systems evaluated for training data extraction risk before deployment and after retraining, using membership inference testing and direct extraction probing?

Net-new control: LLMs can reproduce verbatim training data — including PII, credentials, and copyrighted content — when prompted. This is an AI-specific data leakage vector with no equivalent in conventional data security, not addressed operationally by any existing framework.

multi

Which of the following training data extraction controls are applied to your LLM-based systems?

Membership inference testing performed before deployment with documented resultsDirect extraction probing for known sensitive training data (PII, credentials, copyrighted content)System prompt configuration reviewed for extraction facilitation risksDocumented risk threshold above which mitigations are requiredOutput length constraints to limit verbatim reproductionPII redaction applied to LLM outputsDifferential privacy applied during model training

Membership inference testing and extraction probing are the minimum baseline. Mitigations (output length constraints, PII redaction, differential privacy) are required for any LLM that scores above the documented risk threshold in its evaluation. The risk evaluation methodology and results must be retained.

Search controls

AIG-035 Training Data Memorisation and Extraction Controls

Framework Mappings (3)

Evidence (2)

Questions (2)