Skip to content

Instantly share code, notes, and snippets.

@zachm
Last active November 2, 2017 15:21
Show Gist options
  • Save zachm/52a221765dc0c90cb43dde3028f624f5 to your computer and use it in GitHub Desktop.
Save zachm/52a221765dc0c90cb43dde3028f624f5 to your computer and use it in GitHub Desktop.
Automacon

Automating Kubernetes Cluster Ops at Digital Ocean

Dan Norris @ Digitalocean

They built DO using DO components, but because they obviously have a decent amount of infrastructure they use Terraform to manage it. Droplets module, then hook it into Chef - combine launch and provision steps.

Vault as a CA for Kubernetes - they have a blog post out on this. http://do.co/vault Some examples are given of Terraform commands; they don't appear to have much sanity checking around their workflow (e.g. terraform apply vs make plan/apply). This might be simplified for the talk - for their sake I hope it is.

terraform taint - using it to mark resources as requiring replacement.

Open Source, Supply Chains, and You!

Robyn Bergeron @ Ansible/RedHat

This is all about working with vendors, focusing on prepackaged FOSS. The speaker references her time on Fedora flamewars which gives an interesting set of insights.

Why is CUPS in my cloud image? ...good question.

Feedback loop with suppliers of FOSS? The best way to get better software from your partners is to communicate effectively with them...

  • Know your system
  • Know your dependencies
  • Automate the "red flags"
  • Have an actual relationship
  • Have a plan

The Network is Infrastructure Too

Pete Lumbis @ Cumulus

(I really enjoyed Pete's talk. For me, it shed some light on what neteng folks go through every day...)

Who includes the network when you test any of your apps or infra? YEAH NOBODY LOL.

Unfortunate state of the art:

  • Config designed for humans not people
  • Software is all custom with weird oneoff interfaces
  • No on-box apps (no Puppet) and no real VMs

How do you test it? Get paged when it breaks... how many people use ping as a smoke test... oh boy.

What's borked?

  • Bad comms (email)
  • Hard or impossible to test outside of prod, because dev is so so different from prod.
  • No change validation, haha "roll forward"
  • Siloed people

CI for the network, you can use VMs etc, he hosts his on gitlab...

Behavior Driven Development: because lots of neteng folks aren't programmers. Tool called behave.

  • CI/CD is magic for neteng (apparently)
  • Neteng are policy experts, not coders (yet)

The Psychology of Security Automation

Jason Chan @ Netflix

Usual topics in security talks: differences between devs and infosec, SSL/TLS and the joys/sorrows of it, etc.

Lemur: Tool they built to deal with Heartbleed and things like it. APIs are necessary to do any of this well and to empower devs, etc. Reliability: get reminded when certs about to expire, etc.

If we used an internal CA heavily, I'd strongly suggest trying Lemur: https://github.com/Netflix/lemur

Collaborate with devs: the old way isn't working anymore.

Rollie Pollie: ChatOps for approve/deny operations on AWS IAMs. Heavily used by their power users internally, works via PRs.

Lots of other API integration with third party sec vendors. One main insight though is that you decouple push schedule from the security team.

Habitat 101, An Introduction to Habitat

Joshua Timberman @ Chef

(Great talk, my notes are woefully inadequate.)

Why do we run infra in the first place? Because it's fun? Because the biz requires it?

Lol nobody actually runs docker in production. Do you run on BEAR METAL?

bear metal

Package, configuration, service, all in one: https://www.habitat.sh/try/

Habitat is app automation that does any app ever made in any language all the time. ...Cue old IBM commercial, "How is that possible?"

Ooh it'll let you defer some configuration until runtime. That's nice. Clean-room build. So, just glibc for bash. Oh! And no CUPS! Yay! The actual start process is system-specific, but they give you a template? So... how's this run everywhere if the start process is a shell script. No way it'll be portable. Do I have to write a new one for each platform?

Ooh there are custom callbacks (export JAVA_HOME, anyone?), there are manifests that basically tar everything up.

Not strong support for "older OSes", I bet this doesn't work on Lucid. Gossip protocol for connecting multiple pieces of a greater system which allows for dynamic runtime auto-configuration.

How does it work with containers? Yes, well from the 'studio' it treats Docker as a first class citizen and will make very thin images as a result.

Achieving continuous deployment on Kubernetes

Dan Bode @ Intel

Building an internal CD thing for Intel peeps. Wanted these things:

  • Push-button deploys
  • Auto-validation
  • On-merge staging/prod updates

KUBERNIZE (oh god) all the applications! This looks much like PaaSTA plus Hashicorp Vault.

I didn't feel any of this was terribly different than some of the PaaSTA talks I've seen. Obviously about a different system though.

Infrastructure as Code with Terraform

Seth Vargo @ Hashicorp

He's selling us on Terraform. This is useful given some of the folks here come from very enterprisey backgrounds, but is much less useful for me/us.

Unikernels: A New Frontier

John Feminella @ Pivotal

Unikernels! I didn't take many notes, but it was an interesting talk on a shiny topic.

Security: mixed feels, given mixed attack vectors Introspection: no strace or similar tools, difficult to do New paradigm: like, new instance per request? Since unikernels are omg so fast. But is that even useful?

Run it in production? He says DO NOT! Probably good advice given the immaturity in the space. Worth keeping an eye on though.

State of Infrastructure as Code

Chris Munns @ Amazon

Chris is a good speaker, and he's fun to talk to off stage. He's an ex engineer who now manages biz dev stuff for AWS' developer-focused products (CodeDeploy et al.)

Why IaC: Briefly: scale, ephemeral resources, fast app iteration, cattle not pets.

Auto-promotion between dev, stage, prod. Where is your red line? He acknowledges the idea that dev and ops end up handing over final control at some point in the stack, but he makes it clear that depending on that line to be in the same place at every firm doesn't work. So know where your line is, and respect it!

New AWS landing page: aws.amazon.com/devops

Everybody has a plan until... — Automation Evolution While Scaling

Pete Cheslock @ Threat Stack

Pete is super funny - watch his talk when it comes out on Youtube!

Talking about early days at Threat Stack, super greenfield development. Had to move really fast toward launch because they were getting tons of free PR from a thing with re:Invent and Werner Vogels.

"Product launch: Similar to building your own car. Did you remember to add brake fluid?!"

Premature optimization is the root of all evil. Stop stealing my slides, Pete!!! LITERALLY EVERYONE USES THE this is fine dog. Including me. And five other people here. His son was born two days before their big product launch, and so paternity leave started. OH BOY!

Relevant to Tobias Splünke: You don't know what you have to optimize for until you launch it! They had this problem with Cassandra write latency, running a 36 node 8xlarge cluster within a month of launch. Spikes with each new sale, event, feature launch, etc. They went back right after lumch and cleaned up after themselves, which was really beneficial years down the road.

SparkleFormation: Ruby-based DSL for AWS CloudFormation. Works well for them because they are all AWS, and when they started Terraform wasn't a thing.

Today, Pete and co. don't even alert on a node failure within an ASG - auto-replacement, etc. Last time he was on call, was paged twice; in the first six months after launch, the average was once every other night.

Their ingest is 10TB raw events per day, about 100,000 events per second; I'll leave comparison to other systems up to the reader ;)

Two types of tools: Ones I don't like, and ones I haven't used yet. Yes! Also, do not be afraid to change tooling if that's something that makes sense for your business and objectives.

Audience Q: What's up with your monitoring? Librato, they love it; no time for Graphite; PaperTrail for logging. Then they got off of Librato in favor of GRAPHITE!!!!!!!!!!!!!!! Big fan of SaaS services since they sell them: "Joe and his team are better at it than you are!"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment