AIG-014 Special Category Data in AI Training

Tier 2+AI

Description

AI systems must not be trained on or use special category personal data (health, biometric, ethnic origin, political opinions, etc.) unless: a documented legal basis exists under applicable data protection law (GDPR Art. 9 or equivalent), the processing is strictly necessary and no alternative exists, appropriate security controls are applied, and the use is documented in the data protection register. Use of special category data solely for bias detection and correction is documented separately with explicit retention and deletion obligations.

Rationale

Special category data in training datasets creates regulatory exposure across GDPR and the EU AI Act; without a clear legal basis and explicit controls, training pipelines may be unlawful.

Framework Mappings (3)

EU AI Act 2024

EU-AI-Art.10.4

Data Governance — Special Category Data Processing for Bias Detection

full

GDPR 2018

GDPR-Art.5.1a

Lawfulness, Fairness and Transparency of Processing

partial

NIST AI RMF 1.0

MEASURE 2.10

AI Privacy Risk Examination

informative

Evidence (2)

recordmanual

Data protection register entry or processing activity record documenting the legal basis for training on special category personal data, the necessity assessment, and applied security controls.

Example: ROPA entry — Health Data in Bias Correction Pipeline (OneTrust or SharePoint), recording GDPR Art. 9(2)(g) basis, necessity justification, pseudonymisation and encryption controls applied, DPO sign-off, and deletion schedule

Test: Request the ROPA or data protection register entries for any AI training pipeline involving special category data. Verify: (1) a specific GDPR Art. 9 (or equivalent) legal basis is stated, (2) necessity is assessed and documented (no less-invasive alternative existed), (3) applicable security controls are listed, (4) DPO or legal review sign-off is present, (5) retention and deletion obligations are specified, (6) for bias correction use, the entry is maintained separately with explicit deletion obligations.

configurationautomated

Technical access controls configuration demonstrating that special category training data is isolated and accessible only to authorised roles in the data pipeline.

Example: AWS S3 bucket policy and IAM role configuration for special-category-training-data bucket: access restricted to ml-training-role with MFA required, object-level encryption enabled (SSE-KMS), no public access, access logs enabled

Test: Review the access control configuration for storage containing special category training data. Verify: (1) access is restricted to named roles with a documented business need, (2) encryption at rest is applied, (3) no public access is permitted, (4) access logging is enabled, (5) configuration matches the security controls stated in the ROPA entry.

Questions (2)

boolean

Does your organisation have documented controls preventing the use of special category personal data in AI training unless a specific legal basis exists?

Special category data (health, biometric, ethnic origin, political opinions, etc.) in training datasets creates significant GDPR and EU AI Act exposure. Processing must have an Art. 9 legal basis and be strictly necessary.

multi

If special category personal data is used in any AI training or evaluation pipeline, which of the following controls are in place?

Documented GDPR Art. 9 (or equivalent) legal basisNecessity assessment confirming no less-invasive alternative existsDPO or legal review sign-offAccess restricted to authorised roles onlyEncryption at rest appliedSeparate documentation for bias-correction use with explicit deletion obligations

All applicable controls should be in place. If special category data is not used in any pipeline, answer 'Does not apply' — absence of such data should be confirmed positively, not assumed.

Search controls

AIG-014 Special Category Data in AI Training

Framework Mappings (3)

Evidence (2)

Questions (2)