INF-012 Capacity and Performance Management
Description
Current and projected resource utilisation (compute, storage, network, and API throughput) is monitored against defined thresholds. Capacity plans are maintained to ensure systems can meet operational demands. Alerts are configured for resource exhaustion conditions before they affect service availability.
Rationale
Resource exhaustion — whether from organic growth or denial-of-service — is a primary availability threat. Proactive capacity management enables timely scaling decisions and preserves SLA commitments.
Framework Mappings (4)
| I&S-02 | Capacity and Resource Planning | full |
| 8.6 | Capacity management | full |
| SC-5 | Denial-of-service Protection | partial |
| A1.1 | Capacity Management | full |
Evidence (2)
Resource utilisation dashboards and alerting configuration for production infrastructure showing monitoring coverage and defined threshold alerts for compute, storage, network, and API throughput.
Example: AWS CloudWatch, Datadog, or Grafana dashboard export showing CPU, memory, storage, and API latency metrics for production workloads, with alert threshold configuration visible
Test: Request a monitoring dashboard export and the alert configuration for capacity thresholds. Verify: (1) metrics are collected for all critical resource types (CPU, memory, disk, network I/O, API throughput); (2) alert thresholds are set below resource exhaustion limits; (3) alerts are routed to an active on-call channel or queue; (4) review the alert history — confirm alerts fired before actual exhaustion events in the last 90 days.
Capacity plan or capacity review record demonstrating projected utilisation has been assessed against operational demand forecasts.
Example: Quarterly capacity review report or capacity planning record (e.g., Confluence page or document), showing projected versus actual utilisation trends and planned scaling actions
Test: Request the most recent capacity plan or review record. Verify: (1) the plan covers all critical infrastructure components; (2) current utilisation is compared against projected demand; (3) scaling decisions or actions are documented; (4) the review was completed within the defined frequency.
Questions (2)
Is current and projected resource utilisation (compute, storage, network, and API throughput) monitored against defined thresholds, with alerts configured before exhaustion conditions occur?
Monitoring should cover all critical resource types. Alerts should fire with enough lead time to allow scaling decisions before service is impacted.
How frequently is capacity planning reviewed to ensure production infrastructure can meet operational demands?
Auto-scaling removes much of the risk but does not eliminate the need for capacity planning at the service limit level. Quarterly reviews are a minimum for services with defined availability SLAs.