AIG-018 AI System Operational Monitoring

Tier 2+AI

Description

Each production AI system has a defined monitoring plan that specifies: metrics to be tracked (e.g. error rates, latency, output confidence distributions, null/refusal rates), alert thresholds, monitoring cadence, and named owner responsible for reviewing alerts. Monitoring is active from the moment a system enters production. Monitoring results are reviewed at a defined frequency (at minimum monthly for Tier 2+, weekly for Tier 3). Alerts trigger a documented triage process.

Rationale

AI system behaviour degrades in production in ways not visible from infrastructure metrics alone; operational monitoring must be AI-specific, not inherited from generic APM tooling.

Framework Mappings (5)

EU AI Act 2024

EU-AI-Art.26.4

Deployer Obligations — Operational Monitoring and Incident Notification

full

ISO/IEC 42001:2023

A.6.2.6

AI system operation and monitoring

full

NIST AI RMF 1.0

MANAGE 4.1	Post-Deployment AI System Monitoring	full
MEASURE 2.4	AI System Production Monitoring	full
MEASURE 3.1	AI Risk Identification and Tracking	partial

Evidence (2)

configurationautomated

Monitoring plan or monitoring configuration for each production AI system, specifying tracked metrics, alert thresholds, monitoring cadence, and named monitoring owner.

Example: Datadog monitor configuration export for ai-fraud-detection service: monitors for inference error rate (alert >2%), p95 latency (alert >800ms), null/refusal rate (alert >5%), output confidence distribution (alert if mean <0.7), owner tag: ml-ops-team, cadence: real-time streaming with daily digest review

Test: Request monitoring configuration or plan for a sample of production AI systems. Verify: (1) monitored metrics include AI-specific measures (confidence distribution, refusal/null rate, output category distribution) in addition to infrastructure metrics, (2) alert thresholds are defined for each metric, (3) a named owner is assigned, (4) a triage process for alerts is documented and accessible, (5) monitoring was active from system go-live (check monitor creation date vs deployment date).

logautomated

AI system monitoring review records (alert history and response logs) demonstrating that alerts are reviewed at the defined frequency and trigger a documented triage response.

Example: Datadog incident log for ai-recommendation-engine (last 90 days): 3 alerts triggered, each with a linked incident record in PagerDuty showing triage start time, investigation notes, and resolution action

Test: Request monitoring review records for a 90-day sample period. Verify: (1) alerts were reviewed within the SLA defined in the monitoring plan, (2) each alert has a corresponding triage record, (3) review cadence matches the defined frequency (monthly for Tier 2+, weekly for Tier 3), (4) no alerts were silently closed without investigation records.

Questions (2)

boolean

Does each production AI system have a defined monitoring plan specifying metrics, alert thresholds, review cadence, and a named monitoring owner?

AI system behaviour degrades in ways not visible from infrastructure metrics alone. Monitoring must include AI-specific measures — confidence score distributions, null or refusal rates, output category distributions — in addition to standard latency and error rate metrics.

multi

Which AI-specific metrics are included in your production monitoring for AI systems?

Output confidence score distributionNull rate or refusal rateOutput category or label distributionHuman override or escalation rateInput data distribution shiftsModel error rate (distinct from application error rate)

Mature AI monitoring includes all six. Programmes that monitor only latency and error rates are using generic APM tooling, which misses the behavioural degradation patterns specific to AI systems.

Search controls

AIG-018 AI System Operational Monitoring

Framework Mappings (5)

Evidence (2)

Questions (2)