- Project Requirements
- Configuration Management
- CI/CD
- Service Level Managment Requirements
- Monitoring
- Logging
- Alerting
- SLAs
- Runbooks/Documentation
- Robustness and Resilience/DR Requirements
- Software Stack
- Hardware requirements
- Environment Count
- Instance Count
- [ ]
Infrastructure and tool install/configure
- Server is launched through chef
- Base is installed
- application is installed through chef
- app configuration is in CM recipe
- config for each environment is cheffed
- project hosts are cheffing on schedule
- [ ]
Deploy code and apps
- staging deploy built
- prod deploy built
- testing configured
- testing gated
- [ ]
Central Job scheduling Elevated Access for Devs
- rundeck keys are on box
- crons are stored on rundeck server
- server application jobs are configured
- [ ]
Monitoring Requirements
- hosts are sending system metrics (CPU, Memory, Disk, Network)
- server apps are sending metrics (server: nginx, IIS; middleware: redis, rabbitmq...)
- apps are sending specific metrics (process: cpu, memory; garbage collection; process forking...)
- response times (every node, external)
- [ ]
- system logs are forwarding
- app logs are forwarding
- log groks are configured
- filters built
- [ ]
- system alerts configured
- app alerts configured
- app specific alerts configured
- health check for each node in cluster
- [ ]
- App is added to pingdom
- external service checks added
- checks are set to update hipchat
- checks are set to call pagerduty
- [ ]
- escalation process defined
- groups configured in Pagerduty
- [ ]
- Application is added as a component
- Critical Alerts are set to update statuspage
- [ ]
- systems added to CMDB
- runbook created in Confluence
- Hosts added to asset mgmt
- troubleshooting steps defined
- [ ]
Redundant Hardware / Cluster Performance/Scalability
- Load balanced?
- load tested
- redundant hosts
- config management
- auto-scale
- time to restore
- [ ]