Skip to content

Instantly share code, notes, and snippets.

@iflowfor8hours
Last active April 27, 2020 09:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save iflowfor8hours/550fd157e27ca2cd0915c1ff7b55d85a to your computer and use it in GitHub Desktop.
Save iflowfor8hours/550fd157e27ca2cd0915c1ff7b55d85a to your computer and use it in GitHub Desktop.
Late night ramblings to turn into presentation of what we do

What is the big picture?

We are headed down a well-trod but ever-changing path toward becoming a cloud-native product company. There have been many distractions and upsets along the way. There will be more to come, and likely a few catastrophies. There is no going back. In some sense this journey is uniquely our own and the barriers are of our own creation. Thankfully, we are not the first to do this. The shape of what it entails and where it will lead are outlined by the creators of the practices and tools we intend to adopt along the way. The Cloud Native Computing Foundation describes it in the CNCF Trail Map

CNCF Trail Map

The broad strokes of our journey will be:

  1. Containerization, CD, and configuration
  2. Orchestration, Observability, and Analysis
  3. Networking, service discovery, security
  4. Distributed databases and storage
  5. Messaging, serverless, and event-driven workflows

We are somewhere between the second and first phase of running this playbook. With slightly varying levels of maturity with technologies further down the path, but mostly just beginning.

What does this have to do with kaizen again?

Over the life of the platform, the cognitive cost of making cross-team changes has increased. Dramatically. The demand for features has been driving most of the near-term planning, and as a result, collaboration and mobility between teams has fallen. The rising complexity of the infrastructure, lack of well-understood visibility tools, and architectural decisions being made inconsistently has pushed complexity into oddly-shaped domains, and isolated silos of tribal knowledge prevent bottom-up changes to propagate to other teams.

What does quality consist of?

how do we quantify and communicate it? how do we keep ourselves honest? how do we know when to stop building and start fixing?

What does good look like? (documentation)

Have a look around gitlab's excellent developer setup. The aspects that set it apart are the readily available "top-level" concerns. These are the things that someone trying to become familiar with or debug something in your ecosystem would look for first, explained below.

What makes this good?

  • A getting started checklist targeted at the person consuming the documentation

  • Outline of real-world process to get a single change accepted that is applicable across teams and disciplines (Using a great PR Submission Template)

  • Development environment setup options and related activities are isolated and expectations of what will/won't work are are set

  • A narrative of the happy-path request flow, followed by a high-level diagram (with references to more granular resources) simplified architecture

  • A table of services, with their component parts and associated technologies are enumerated. Each service in the table gives it's name, a short description, the environments it works in, and some application-specific human-readable metadata. Each item in the service metadata is a link to the relevant file or documentation on how to find it. An annotated example is below

    name of service:

    • Project page: (source repo, build pipelines)
    • Configuration: (environments)
      • Omnibus (the template from which all environment-specific configs are defined)
      • Helm Chart (Helm Chart repo, wrapper documentation)
      • Source (where to go add or make a change to an environment)
      • GDK (Development environment specific documentation, usually a getting-started)
    • Layer: Core Service (where this service fits in the greater architecture)
    • Process: pobodysnerfict-service (process or container name)
    • Project_name: Service Architecture (links to monitoring, orchestration, observability, and other NFR concerns and how to access them)
  • ...Then it trails off into a linkfarm of solutions, practices, and very specific hacks and tricks documented in anger, much like most technical wikis or internally-facing documentation

This documentation has obviously grown and changed... a lot. It is clearly living documentation. Is it complete and accurate? Probably not. However the important information is easily discoverable and the structure is consistent.

What is lacking?

What does good look like (build)

Cite Intel Clear Linux

How do we get here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment