AIG-022 Human Oversight of AI Outputs
Description
AI systems that produce outputs used in decisions affecting individuals have documented human oversight mechanisms. Oversight measures are proportionate to risk: Tier 3 systems require human review of AI-generated outputs before action is taken; Tier 2 systems require reviewable audit trails and human escalation paths. Oversight persons have defined competencies, appropriate training, and sufficient time to exercise meaningful review. Automation bias risks are explicitly addressed in operator guidance.
Rationale
Human oversight is the last line of defence against harmful AI outputs; it must be substantively designed, not nominal.
Framework Mappings (6)
| EU-AI-Art.14.1 | Human Oversight — System Design for Oversight | full |
| EU-AI-Art.14.2 | Human Oversight — Capabilities Assigned to Oversight Persons | full |
| EU-AI-Art.26.2 | Deployer Obligations — Human Oversight Assignment | full |
| GOVERN 3.2 | Human-AI Configuration Roles | full |
| MAP 3.4 | Operator Proficiency Processes | partial |
| MAP 3.5 | Human Oversight Process Definition | full |
Evidence (2)
Human oversight design document or operational procedures for each Tier 2+ AI system, specifying oversight mechanism, reviewer competency requirements, time allocation, and automation bias mitigation guidance.
Example: Human Oversight Procedure — AI Credit Decisioning System (Confluence), specifying that all AI-flagged decline decisions require human review within 4 hours, reviewer qualification requirements (credit underwriting certification), automation bias awareness training requirement, and escalation path for reviewer disagreement
Test: Request human oversight documentation for each Tier 2+ production AI system. Verify: (1) oversight mechanism is described (human review before action, audit trail with escalation, etc.), (2) oversight is proportionate to tier (Tier 3 requires pre-action review), (3) reviewer competency requirements are defined, (4) automation bias risk is explicitly addressed in operator guidance, (5) the oversight person has sufficient time allocation to conduct meaningful review (not nominal sign-off).
Audit trail records showing human review and override events for AI-generated outputs, demonstrating that oversight is operationally active and not merely nominal.
Example: AI-Credit-System override log (Splunk, last 90 days): 1,247 AI decisions reviewed, 89 overrides recorded with reviewer ID, timestamp, and override reason category; override rate 7.1%, consistent with expected range 5–10%
Test: Request override and review event logs for a 90-day sample. Verify: (1) human review events are recorded with reviewer identifier, timestamp, and decision, (2) override events include a reason category, (3) override rate is within the documented expected range (an override rate of 0% may indicate rubber-stamping), (4) log confirms oversight is occurring at the required frequency and volume.
Questions (2)
Do AI systems that produce outputs used in decisions affecting individuals have documented human oversight mechanisms proportionate to their risk level?
Human oversight is the last line of defence against harmful AI outputs. It must be substantively designed — not nominal. Oversight persons must have defined competencies, training, and sufficient time to conduct meaningful review.
Which of the following are true of your AI human oversight programme?
All six characteristics indicate substantive oversight. An override rate of 0% over extended periods is a strong signal of nominal (rubber-stamp) oversight rather than genuine review.