Skip to content

Instantly share code, notes, and snippets.

@lethain
Last active January 29, 2020 14:23
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save lethain/be56483063be9d76f75aa0ab9401d938 to your computer and use it in GitHub Desktop.
CFP for "How Stripe invests in infrastructure"

Short description

Learn how Stripe has evolved their approach to prioritizing infrastructure as they grew from two founders to 1,300+ employees and millions of users.

Long description

Deciding what to work on is always difficult, and is especially treacherous for folks working as infrastructure engineers and leaders. Infrastructure teams that solve the right problems subtlety shift their company's trajectory upwards. Poor approaches lead toward a morass of firefighting and frustration. With so many opportunities and sometimes fuzzy metrics, planning is threading a needle between the tyranny of choice and the specter of ambiguity.

This talk will unpack the process of picking and prioritizing technical infrastructure work, which is rarely if ever discussed, but is so essential to long-term company success. We'll share Stripe's approach to:

  • evolving your approach to prioritizing infrastructure as your company scales,
  • justifying--and maybe even expand--your company's spend on technical infrastructure,
  • exploring the whole range of possible areas to invest into infrastructure,
  • adapting your approach between periods of fire-fighting and periods of innovation, and
  • balancing investment in supporting existing products and enabling new product development.

You'll come away with a broad set of tools, frameworks and ideas for plotting the future of your technical infrastructure.

Outline

  • [3 min] Intro
    • why does it matter if we're good at infrastructure planning?
      • Because it's how we turn unplanned reliability work into planned reliability work.
      • Not just reliability, of course, also true for scalability, efficiency, latency, etc
      • It’s how we make generational improvements instead of incremental ones
      • It’s also how we hire and retain great people!
    • We’re going to talk about four ways to prioritize infrastructure work
      • (1) “fire-fighting”
      • (2) “fulfill the architect’s vision”
      • (2) “five-properties of infrastructure”
      • (3) “users, baselines and timeframes”
  • [2 min] Model 1: “Fire-fighting driven development”
    • This is how most small, product-focused companies plan infrastructure.
    • Extremely focused on product-market fit, do as little infra investment as possible
    • When something goes wrong, swarm to fix that, then go back to product work
  • [2 min] Model 2: “Fulfill the architect’s vision”
    • This is how most small, infrastructure-focused companies plan infrastructure
    • Solving the problems of previous job or replicating previous job’s solutions
    • Very focused on building to support future growth
    • Can easily veer into premature optimization
  • [5 min] Model 3: “Five properties of infrastructure”
    • An approach I’ve developed for infrastructure teams <100 engineers
    • There are 5 infra properties: recurity, reliability, productivity, efficiency, latency
    • For your company, figure out how much you want to invest into each of those, and use that to guide investment
    • This runs into some problems as teams grow, doesn’t provide focus or coherent narrative to explain to other teams or executives
  • [5 min] Model 4: “Users, baselines and timeframes”
    • An approach I’m currently using for an infrastructure team of ~200
    • Two parts - first discover things we can do, then figure out how to prioritize within that universe of possible work
    • Discovering possible work
      • Users - who are our users? Which user cohorts are we focused on supporting this half? What are their needs?
      • Baselines - we set baseline metrics for each of the five infra properties and SLAs with our key customers
      • Timeframes - what do we need to meet our baselines this quarter? This year? 3 years from now? (Which timeframes matter varies a lot depending on your company’s size!)
    • Prioritizing work
      • User asks (40%) - doing what our users need, even sometimes it doesn’t align with our long-term goals perfectly
      • Platform quality (30%) - keeping the lights on, but a nicer term
      • Key initiatives (30%) - we pick 1-3 projects that the entire infrastructure organization works on together. This is especially important because it helps us explain our focus to other teams and executives
  • [3 min] Ending
    • Different sizes and scenarios need different approaches!
    • Adapt to your needs. But be very intentional.
    • Particularly as you get larger, your ability to plan and explain your plan becomes the core constraint on infrastructure team’s ability to contribute to the company’s success
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment