AIG-014 Special Category Data in AI Training
Description
AI systems must not be trained on or use special category personal data (health, biometric, ethnic origin, political opinions, etc.) unless: a documented legal basis exists under applicable data protection law (GDPR Art. 9 or equivalent), the processing is strictly necessary and no alternative exists, appropriate security controls are applied, and the use is documented in the data protection register. Use of special category data solely for bias detection and correction is documented separately with explicit retention and deletion obligations.
Rationale
Special category data in training datasets creates regulatory exposure across GDPR and the EU AI Act; without a clear legal basis and explicit controls, training pipelines may be unlawful.
Framework Mappings (3)
| EU-AI-Art.10.4 | Data Governance — Special Category Data Processing for Bias Detection | full |
| GDPR-Art.5.1a | Lawfulness, Fairness and Transparency of Processing | partial |
| MEASURE 2.10 | AI Privacy Risk Examination | informative |
Evidence (2)
Data protection register entry or processing activity record documenting the legal basis for training on special category personal data, the necessity assessment, and applied security controls.
Example: ROPA entry — Health Data in Bias Correction Pipeline (OneTrust or SharePoint), recording GDPR Art. 9(2)(g) basis, necessity justification, pseudonymisation and encryption controls applied, DPO sign-off, and deletion schedule
Test: Request the ROPA or data protection register entries for any AI training pipeline involving special category data. Verify: (1) a specific GDPR Art. 9 (or equivalent) legal basis is stated, (2) necessity is assessed and documented (no less-invasive alternative existed), (3) applicable security controls are listed, (4) DPO or legal review sign-off is present, (5) retention and deletion obligations are specified, (6) for bias correction use, the entry is maintained separately with explicit deletion obligations.
Technical access controls configuration demonstrating that special category training data is isolated and accessible only to authorised roles in the data pipeline.
Example: AWS S3 bucket policy and IAM role configuration for special-category-training-data bucket: access restricted to ml-training-role with MFA required, object-level encryption enabled (SSE-KMS), no public access, access logs enabled
Test: Review the access control configuration for storage containing special category training data. Verify: (1) access is restricted to named roles with a documented business need, (2) encryption at rest is applied, (3) no public access is permitted, (4) access logging is enabled, (5) configuration matches the security controls stated in the ROPA entry.
Questions (2)
Does your organisation have documented controls preventing the use of special category personal data in AI training unless a specific legal basis exists?
Special category data (health, biometric, ethnic origin, political opinions, etc.) in training datasets creates significant GDPR and EU AI Act exposure. Processing must have an Art. 9 legal basis and be strictly necessary.
If special category personal data is used in any AI training or evaluation pipeline, which of the following controls are in place?
All applicable controls should be in place. If special category data is not used in any pipeline, answer 'Does not apply' — absence of such data should be confirmed positively, not assumed.