Skip to content

Instantly share code, notes, and snippets.

@ben-bourdin451
Last active September 20, 2020 17:55
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ben-bourdin451/7d4197481efb4a1f6cdbb236af4d1625 to your computer and use it in GitHub Desktop.
Save ben-bourdin451/7d4197481efb4a1f6cdbb236af4d1625 to your computer and use it in GitHub Desktop.
Production ready checklist

Production ready checklist

[] Application logs are streamed and available in a different service
[] Logs are rotated & expired appropriately
[] Continuous integration pipeline builds & runs tests for every commit
[] Continuous deployment pipeline make deployment of any tagged version a single click
[] Service has been deployed and tested in multiple environments with minimal configuration changes

Replicable builds

[] Code in VCS
[] Single command to build, e.g. make
[] Build command works in vanilla VM or container (no implicit deps)
[] Build ships as an immutable image (VM or container)

Replicable infrastructure

[] Infrastructure in VCS
[] Configurations in VCS or as env vars
[] Infra can be destroyed & rebuilt automatically
[] Infra can be replicated in multiple environments (prod, staging...) with minimal effort

Code

[] Secrets are not exposed in the code or build but made availabe securely (env vars, encryption service, etc...)
[] Code is statically analysed for security risks on CI (e.g. using lgtm)
[] Dependencies are scanned and audited for known vulnerabilities as part of the CI process (builds fail on CVSS > 8)
[] OWASP top 10 have been considered and mitigated into the design and implementation of the service

Network

[] Ports are restricted to the bare minimum
[] Network access is restricted to only other services that require it (ACL)
[] Service is deployed to a private network behind a firewall

Data

[] Data is encrypted in transit (HTTPS) for both inbound & outbound
[] Data is encrypted at rest*

Host

[] Host (VM or container) is automatically tested for known vulnerabilities as part of CI/CD pipeline
[] Host OS kernel & required packages (e.g. openssl) are automatically & regurlarly kept up to date regardless of code changes

[] Disaster recovery scenarios have been considered through testing & runbooks

High availability

[] Auto-recovery enabled via healthchecks (node & container level)
[] Spread deployment over multiple machines (> 2)
[] Spread machines over multiple datacenters (AZs)
[] Load balancing or service discovery setup to evenly distribute traffic
[] Auto-scaling enabled using appropriate metrics, c.f. metrics & performance

Monitoring

[] Live service level metrics are available (e.g. number of requests, 2XX responses etc...)
[] Live infrastructre level metrics are available (e.g. cpu, memory, disk space etc...)
[] Reasonable alerts have been setup on these metrics and linked to an alerting service
[] Metrics & alerts have been tested via failure scenarios

Data*

[] Regular automated backups
[] Backup restore is tested frequently
[] Cleanup old backups automatically

[] Load tests are automated & stored as code
[] Load testing of a single deployment has been performed to identify bottlenecks & failure thresholds
[] Load testing of entire service has been performed to identify benchmarks & scaling triggers

[] Costs has been taken into account in the implementation & design of the service
[] Cost metrics or reports have been made available

*where applicable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment