Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mikaelvesavuori/1715a45914b51de460d2b36d97d75e51 to your computer and use it in GitHub Desktop.
Save mikaelvesavuori/1715a45914b51de460d2b36d97d75e51 to your computer and use it in GitHub Desktop.
Production-Readiness Checklist by Susan Fowler

As written by Susan Fowler.

A Production-Ready Service Is Stable and Reliable

  • It has a standardized development cycle.
  • Its code is throughly tested through lint, unit, integration, and end-to-end testing.
  • Its test, packaging, build, and release process is completely automated.
  • It has a standardized deployment pipeline, containing staging, canary, and production phases.
  • Its clients are known.
  • Its dependencies are known, and there are backups, alternatives, fallbacks, and caching in place in case of failures.
  • It has stable and reliable routing and discovery in place.

A Production-Ready Service Is Scalable and Performant

  • Its qualitative and quantitative growth scales are known.
  • It uses hardware resources efficiently.
  • Its resource bottlenecks and requirements have been identified.
  • Capacity planning is automated and performed on a scheduled basis.
  • Its dependencies will scale with it.
  • It will scale with its clients.
  • Its traffic patterns are understood.
  • Traffic can be re-routed in case of failures.
  • It is written in a programming language that allows it be scalable and performant.
  • It handles and processes tasks in a performant manner.
  • It handles and stores data in a scalable and performant way.

A Production-Ready Service Is Fault Tolerant and Prepared for Any Catastrophe

  • It has no single point of failure.
  • All failure scenarios and possible catastrophes have been identified.
  • It is tested for resiliency through code testing, load testing, and chaos testing.
  • Failure detection and remediation has been automated.
  • There are standardized incident and outage procedures in place within the microservice development team and across the organization.

A Production-Ready Service Is Properly Monitored

  • Its key metrics are identified and monitored at the host, infrastructure, and microservice levels.
  • It has appropriate logging that accurately reflects the past states of the microservice.
  • Its dashboards are easy to interpret and contain all key metrics.
  • Its alerts are actionable and are defined by signal-providing thresholds.
  • There is a dedicated on-call rotation responsible for monitoring and responding to any incidents and outages.
  • There is a clear, well-defined, and standardized on-call procedure in place for handling incidents and outages.

A Production-Ready Service Is Documented and Understood

  • It has comprehensive documentation.
  • Its documentation is updated regularly.
  • Its documentation contains a description of the microservices; and architecture diagram; contact and on-call information; links to important information; and onboarding and development guide; information about the service's request flow(s), endpoints, and dependencies; an on-call runbook; and answers to frequently asked questions.
  • It is well understood at the developer, team, and organizational levels.
  • It is held to a set of production-readiness standards and meets the associated requirements.
  • Its architecture is reviewed and audited frequently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment