AIG-019 AI Model Performance and Drift Detection

Tier 2+AI

Description

Deployed AI models are evaluated for performance degradation and distribution shift (data drift, concept drift) on a scheduled basis. The schedule is defined relative to the velocity of the underlying domain (at minimum quarterly for stable domains, monthly for high-velocity domains). Evaluation uses held-out test data or shadow deployments. When performance falls below defined thresholds or drift is detected, a documented escalation path is triggered — ranging from investigation to retraining or decommissioning.

Rationale

Model drift is an AI-specific failure mode that has no equivalent in conventional software; without scheduled evaluation, degraded models operate undetected.

Framework Mappings (5)

EU AI Act 2024

EU-AI-Art.15.2

Accuracy, Robustness and Cybersecurity — Resilience and Fail-Safe Design

partial

NIST AI RMF 1.0

MANAGE 2.2	Deployed AI System Value Maintenance	full
MANAGE 3.2	Pre-Trained Model Monitoring	full
MEASURE 1.2	AI Metrics and Control Effectiveness Assessment	full
MEASURE 4.3	Performance Improvement and Decline Tracking	full

Evidence (2)

reportmanual

Periodic model performance and drift evaluation report demonstrating that deployed models were assessed for performance degradation and distribution shift on the defined schedule, with comparison against baseline metrics.

Example: Quarterly Drift Report — Customer Churn Predictor (Weights & Biases artefact, Q1 2026), showing PSI score for input features, model accuracy vs baseline, concept drift F1 delta, and escalation decision: 'no action required — within threshold'

Test: Request drift evaluation reports for a sample of production models covering the last two evaluation periods. Verify: (1) reports are dated at the scheduled frequency, (2) both data drift and concept/performance drift are evaluated, (3) results are compared to documented thresholds, (4) escalation decision is recorded (no action / investigation / retrain / decommission), (5) where thresholds were breached, a documented escalation action was taken.

tool_outputautomated

Automated drift detection tool output from model monitoring platform (e.g. Evidently AI, Arize, WhyLabs) showing scheduled drift metric computation for production models.

Example: Evidently AI drift report JSON export for fraud-model-prod (weekly run 2026-04-14): feature drift detected on 2/18 features (PSI > 0.2 threshold), dataset drift test: PASS, target drift: PASS, alert fired to ml-monitoring Slack channel

Test: Request automated drift detection tool output for a production model. Verify: (1) drift metrics are computed automatically on the defined schedule (check run timestamps), (2) alert thresholds are configured and alert firing is evidenced, (3) the tool output is linked to the escalation process (Slack/PagerDuty alert or ticket creation), (4) tool is monitoring the live production model (not a shadow environment).

Questions (2)

boolean

Are deployed AI models evaluated on a scheduled basis for performance degradation and distribution shift (data drift, concept drift)?

Model drift is an AI-specific failure mode with no equivalent in conventional software. Without scheduled evaluation, degraded models operate undetected. Evaluation should use held-out test data or shadow deployments and trigger a documented escalation path when thresholds are breached.

select

How frequently are your production AI models evaluated for drift or performance degradation?

Ad hoc — only when an issue is reportedAnnuallyQuarterlyMonthlyContinuously or weekly via automated tooling

Frequency should match the velocity of the underlying domain. High-velocity domains (e.g. fraud, content moderation) require monthly or more frequent evaluation. Annual evaluation is insufficient for any system where the data environment changes.

Search controls

AIG-019 AI Model Performance and Drift Detection

Framework Mappings (5)

Evidence (2)

Questions (2)