Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save mrspence/c6d776cb59a924c477fa7b4687366503 to your computer and use it in GitHub Desktop.
Save mrspence/c6d776cb59a924c477fa7b4687366503 to your computer and use it in GitHub Desktop.
Digital Web Product: Production Checklist

Going to Production: Server Side

This is a community led production checklist for server side applications. Depending on the scale of your product, it may not be feasible or recommended to fulfil every entry of this checklist. Adopting this checklist requires research, common sense, and planning.

This checklist is intended to be un-opinionated, while also remaining useful and clear. It's assumed your product has a web use, however many points can still be used outside of this context.

Any edit suggestions are encouraged in the comments below!

Complete Checklist

  • Legal

    • Collect all licenses of third party depedancies
    • Check license conditions are met
    • Server Side does not break local laws like data security, privacy, GDPR (uk)..
  • Resilience

    • Application can be executed in an isolated environment

    • Application can withstand heavy load An example is simulating 10,000 users registering new accounts within 15 mins...

      • Pick suitable metrics to measure with Input/Output rates, throughput, latency, time to recovery...

      • Write tests for actions and operations that could experience high volumes

    • Application can restart after critical failure

      • Create list of main threats that would cause critical failure. This may be influenced by a Threat Analysis & Risk Assessment should you have one yet

      • Create tests that intentionally cause main threats in an isolated environment. Improve application accordingly based on results.

  • Load balancing

    • Application can run behind multiple nodes
    • Application has 'redundency', so if the target amount of active nodes is not met, a new node is spun up
  • Deployment

    • New iterations of the application do not cause downtime when deploying as a new node
    • Deployment runs on an own instance that does not impact production resources
    • User data in sessions is not lost when nodes go up or down
  • Supervising

    • Application automatically restarts when host machine is turned off/on
    • Application can survive host machine failures
  • Logging

    • Logs are seperated by:
      • Level of error,
      • Date / time,
      • Service
    • Logs are saved in multiple strictly seperate locations to mitigate the risk of logs being lost in a critical outage
    • Developer(s) are immediately notified of critical level errors. This can be achieved in a variety of ways... an entry level could be through a Slack webhook
    • Logs are sent through to a log service that analyses and categorises logs
  • Monitoring

    • Health checking service on a seperate instance that monnitors application uptime and load
    • Alerts / notifications for unexpected issues or usage This could be timeouts, resources nearing limits, error rates exceeding limits...
  • Metrics

    • Application can observe:
      • Number of requests per endpoint,
      • Duration of requests per endpoint,
      • Duration of business-logic operations
  • Availability

    • Services are able to run within different data centers
    • All services are available to end users in data centers nearest them
  • Testing

    • Application comes with a testing solution
      • Testing solution works for CI
      • Testing solution works for stress testing
  • Backups and Restoration

    • Databases and storage are backed up automatically without developer input
    • Databases and storage can be restored rapidly
  • Security

    • The global standard OWASP Top 10 passes for your application
    • TLS (Transport Level Security) is enabled for all production public endpoints
    • Security headers are present for production public endpoints
      • X-Frame-Options
      • X-Content-Type-Options
      • Content-Security-Policy
      • X-XSS-Protection
      • Strict-Transport-Security
      • Public-Key-Pins
  • Second pair of eyes

    • Never assume you're infaluable - humans make mistakes, more so when confident! Another team or developer should also conduct this checklist after you, with minimal input from yourself or your team possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment