INF-013 Infrastructure Redundancy
Description
Production infrastructure is deployed with redundancy to eliminate single points of failure for critical components. Availability architecture (multi-zone, multi-region, or equivalent) is documented and aligned to RTO/RPO targets. Redundancy configurations are tested at defined intervals.
Rationale
Single points of failure in cloud infrastructure result in outages that breach SLA commitments. Verified redundancy is the technical foundation of availability guarantees.
Framework Mappings (3)
| BCR-11 | Equipment Redundancy | full |
| 8.14 | Redundancy of information processing facilities | full |
| A1.2 | Environmental Protections, Software, Data Back-Up Processes, and Recovery Infrastructure | partial |
Evidence (2)
Infrastructure deployment configuration showing multi-zone or multi-region redundancy for critical production components, with no single points of failure for services subject to availability SLAs.
Example: AWS CloudFormation template, Terraform configuration, or cloud provider console screenshot showing auto-scaling groups spanning multiple availability zones, load balancer configuration, and multi-AZ database configuration for production services
Test: Review infrastructure-as-code or cloud console configuration for production services. Verify: (1) critical compute services are deployed across at least two availability zones; (2) database services use multi-AZ or equivalent replication; (3) load balancers are configured to route around failed zones; (4) verify the redundancy configuration matches the documented RTO/RPO commitments.
Redundancy test record demonstrating that failover between zones or regions was tested and recovery met RTO targets.
Example: Chaos engineering test report or availability failover drill results (e.g., AWS Fault Injection Simulator run log, or equivalent) documenting the test scenario, results, and measured recovery time
Test: Request the most recent redundancy or failover test record. Verify: (1) the test covered the failure of a primary availability zone or equivalent component; (2) recovery time was measured and documented; (3) measured recovery time is at or below the defined RTO; (4) the test was conducted within the defined interval.
Questions (2)
Is production infrastructure deployed with redundancy that eliminates single points of failure for critical components, with availability architecture documented and aligned to RTO/RPO targets?
For cloud-native SaaS, multi-AZ deployment for all critical components (compute, database, load balancer) is the minimum expected redundancy posture.
What level of infrastructure redundancy is implemented for production services?
Multi-AZ within a single region is the baseline expectation. Multi-region is expected where SLAs commit to recovery times that a single-region failure would breach.