AIG-025 AI Fairness and Bias Controls
Description
AI systems subject to bias risk (systems that score, rank, recommend, or classify individuals) have documented fairness objectives with measurable definitions (e.g. demographic parity, equalised odds) relative to the system's purpose. Bias testing is conducted prior to deployment and periodically in production, disaggregated by relevant protected characteristics. Where bias is detected above defined thresholds, a remediation action is required before continued deployment. Bias testing methodology and results are retained.
Rationale
Bias is an AI-specific harm that cannot be detected from system logs or security tests alone; it requires dedicated measurement against defined fairness criteria.
Framework Mappings (5)
| EU-AI-Art.10.2 | Data Governance — Data Preparation and Bias Management | full |
| EU-AI-Art.15.1 | Accuracy, Robustness and Cybersecurity — Performance Standards | partial |
| GDPR-Art.5.1a | Lawfulness, Fairness and Transparency of Processing | partial |
| GOVERN 3.1 | Diverse Team Decision-Making | partial |
| MEASURE 2.11 | AI Fairness and Bias Evaluation | full |
Evidence (1)
Bias evaluation report produced before deployment and periodically in production, disaggregated by relevant protected characteristics, with results compared to documented fairness thresholds.
Example: Bias Evaluation Report — Loan Scoring Model v4.1 (Weights & Biases artefact, 2026-Q1), showing demographic parity difference ≤ 0.05 for gender and ethnicity, equalised odds gap ≤ 0.03, comparison to thresholds defined in AI fairness objectives, result: PASS
Test: Request bias evaluation reports for a sample of AI systems subject to bias risk, covering pre-deployment and at least one in-production evaluation. Verify: (1) evaluation is disaggregated by relevant protected characteristics for the system's context, (2) fairness metric definitions match those documented in the AI fairness objectives, (3) results are compared to documented pass/fail thresholds, (4) threshold failures have a documented remediation action and re-evaluation result, (5) production evaluation frequency matches the defined schedule.
Questions (2)
Do AI systems that score, rank, recommend, or classify individuals have documented fairness objectives with measurable definitions?
Bias cannot be detected without predefined, measurable fairness criteria. Objectives should specify the fairness metric (e.g. demographic parity, equalised odds) and the pass/fail threshold, relative to the system's purpose and affected population.
How is bias testing conducted for AI systems subject to bias risk in your organisation?
All five practices are expected. Bias testing conducted only at deployment without production monitoring misses in-production bias accumulation from feedback loops and data drift.