AIG-027 AI Output Validation and Confidence Controls

Tier 2+AI

Description

AI systems that produce outputs acted upon by users or automated processes have defined acceptable output ranges or confidence thresholds. Outputs below the minimum confidence threshold trigger a defined fallback: human review queue, abstention, or escalation — not silent degradation. Output validation logic is documented and version-controlled. For classification tasks, threshold calibration is tested and its impact on precision/recall documented. Output ranges and thresholds are reviewed after any model update.

Rationale

AI systems that act on low-confidence outputs without disclosure or fallback create uncontrolled risk; confidence-gating is a structural quality control unique to probabilistic systems.

Framework Mappings (3)

EU AI Act 2024

EU-AI-Art.13.3

Transparency — Mandatory Content of Instructions for Use

informative

NIST AI RMF 1.0

MANAGE 2.4	AI System Deactivation and Override Mechanisms	informative
MEASURE 2.3	AI System Performance Measurement	informative

Evidence (2)

configurationautomated

Output validation configuration for AI systems, documenting defined confidence thresholds, fallback behaviour triggered below threshold, and version-controlled validation logic.

Example: Model serving configuration — fraud-classifier-prod (exported from BentoML or Seldon, YAML): confidence_threshold: 0.82, low_confidence_action: route_to_human_review_queue, abstain_below: 0.60, threshold_version: v3 (git commit abc123), last_reviewed: 2026-01-20

Test: Request the output validation configuration for a sample of AI systems acting on outputs. Verify: (1) confidence thresholds are defined per use case (not a single global default), (2) fallback behaviour is configured (human review queue, abstention, or escalation — not silent pass-through), (3) configuration is version-controlled with a dated review record, (4) for classification tasks, threshold calibration results are documented showing precision/recall impact, (5) thresholds were reviewed after the last model update.

logautomated

Low-confidence output routing logs demonstrating that outputs below the defined confidence threshold are actually being routed to the defined fallback, rather than passed through silently.

Example: Datadog log query result for fraud-classifier-prod (last 30 days): 2,341 events with confidence < 0.82, action=human_review_queue; 0 events with confidence < 0.82 and action=auto_approve — confirms fallback routing is functioning

Test: Query AI event logs for low-confidence output routing events over a 30-day period. Verify: (1) events with confidence below the configured threshold are present in logs, (2) all such events show the correct fallback action (human review / abstention), (3) no events show auto-approval or silent pass-through below threshold, (4) the volume of low-confidence events is reviewed periodically to inform threshold calibration.

Questions (2)

boolean

Do AI systems that produce outputs acted upon by users or automated processes have defined confidence thresholds, with outputs below threshold triggering a documented fallback?

Net-new control: confidence-gating is a structural quality control unique to probabilistic AI systems, not addressed by existing frameworks at an operational level. Outputs acted upon without confidence validation create uncontrolled downstream risk.

select

What action is taken when an AI output falls below the defined confidence threshold?

No defined threshold or fallback existsOutput is passed through with no change (silent degradation)A warning flag is added to the output but no action is requiredOutput is routed to a human review queueThe system abstains and requests additional inputOutput is escalated with a mandatory review before action is taken

Routing to human review, abstention, or mandatory escalation are all acceptable fallbacks. Silent pass-through of low-confidence outputs is not acceptable for systems where outputs drive consequential decisions.

Search controls

AIG-027 AI Output Validation and Confidence Controls

Framework Mappings (3)

Evidence (2)

Questions (2)