daks/cfgmgmtcamp2016.org

## cfgmgmtcamp2016.org

      
    Raw
  

              cfgmgmtcamp2016.org
            
          
    mark shuttleworth, juju

q&a


  security?
    
      same issue as docker images from dockerhub
      http://twitter.com/nixgeek/status/694103481909649409
    
  
mitchellh, vault

what is it?


  secret management
    
      “secret” vs “sensitive”
      secrets: creds, certs, keys, passwords
      sensitive: phone numbers, mother’s maiden name, emails, dc
        locations, PII
      we think correctly about secrets
      maybe not so much with sensitive data
    
  
  my definition: “anything which makes the news”
    
      this includes “secrets” and “sensitive” categories above
    
  
  vault is designed for this case
  certificates
    
      specific type of secret
      backed by RFCs
      used almost universally
      historically a pain to manage
    
  
the current state


  unencrypted secrets in files
    
      fs perms
    
  
  encrypted secrets with the key on the same fs
    
      eg chef databags
    
  
  also, how does the secret get on to the system?
  why don’t we use config management for secrets? because config
    management tools don’t have features we need:
    
      no access control
      no auditing
      no revocation
      no key rolling
    
  
  why not (online) databases?
    
      rdbms, consul, zookeeper, etc
        
          (one of the big motivations to create vault was we didn’t
            want people using consul to store secrets)
        
      
      not designed for secrets
      limited access controls
      typically plaintext storage
        
          (even if the filesystem is encrypted, which has its own issues)
        
      
      no auditing or revocation abilities
    
  
  how to handle secret sprawl?
    
      secret material is distributed
        
          don’t want one big secret with all the access to everything
        
      
      who has acces?
      when were they used?
      what is the attack surface?
      in the event of a compromise, what happened? what was leaked?
        
          can we design a system that allows us to audit what was
            compromised?
        
      
  how to handle certs?
    
      openssl command line (ugh)
      where do you store the keys?
      if you have an internal CA, how do you manage that?
      how do you manage CRLs?
    
  
the glorious future


  vault goals:
    
      single source for secrets, certificates
      programmatic application access (automated)
      operator access (manual)
      practical security
      modern data center friendly
        
          (private or cloud, commodity hardware, highly available, etc)
        
      
  vault features
    
      secure secret storage
      full cert mgmt
      dynamic secrets
      leasing, renewal
      auditing
      acls
    
  
  secure secret storage
    
      encrypt data in transit and rest
      rest: 256bit AES in GCM
      TLS 1.2 for clients
      no HSM required (though you can if you want)
    
  
  inspired by unix filesystem
    
      mount points, paths
    
  
$ vault write secret/foo bar=bacon
success!
$ vault read secret/foo
Key             Value
lease_id        ...
lease_duration  ...
lease_renewable false
bar             bacon


  dynamic secrets
    
      never provide “root” creds to clients
      secrets made on request per client
    
  
$ vault mount postgres
$ vault write postgresql/config/connection value=...
# written once, can never be read back

$ vault read postgresql/creds/production
-> get back a freshly-created user
-> if you don't come back to vault within an hour, vault will drop the user


  auditing
    
      pluggable audit backends
      request and response loggin
      prioritizes safety over availability
      secrets hashed in audits (salted HMAC)
        
          searchable but not reversible
        
      
  rich ACLs
  flexible auth
    
      pluggable backends
      machine-oriented vs operator-oriented
      tokens, github, appid, user/pass, TLS cert
      separate authentication from authorization
    
  
  high availability
    
      leader election
      active/standby
      automatic failover
      (depending on backend: consul, etcd, zookeeper)
    
  
  “unsealing the vault”
    
      data in vault is encrypted
      vault requires encryption key, doesn’t store it
      must be provided to unseal the vault
      must be entered on every vault restart
      turtles problem!
        
          this secret is sometimes on a piece of paper in a physical
            safe
        
      
      alternative: shamir’s secret sharing
        
          we split the key
          a number of operators have a share of the key
          N shares, T required to recompute master
          default: N:5 T:3
        
      
vault in practice


  ways to access vault
    
      http + TLS API, JSON
      CLI
      consul-template (for software you won’t rewrite to talk directly
        to vault)
    
  
  application integration
    
      the best way to access vault
      vault-aware
      native client libraries
      secrets only in memory
      safest but high-touch
    
  
  consul-template
    
      secrets templatized into application configuration
      vault is transparent
      lease management is automatic
      non-secret configuration still via consul
      (then put your secrets on to a ramdisk and make sure the
        ramdisk can’t be swapped)
    
  
  PII
    
      it’s everywhere
      “transit” backend for vault
      encrypt/decrypt data in transit
      avoid secret management in client apps
      builds on vault foundation
      web server has no encryption keys
      requires two-factor compromise (vault + datastore)
      decouples storage from encryption and access control
    
  
$ vault write -f transit/keys/foo
$ vault read -f transit/keys/foo
-> don't key key, just metadata about key
to send:
$ echo msg | vault write transit/encrypt/foo plaintext=-

to recv:
$ vault write transit/decrypt/foo ciphertext=-


  rich libraries to do this automatically: vault-rails
  disadvantage: everything round-trips through vault
    
      can increase performance with another trick (…?)
    
  
  CA
    
      vault acts as internal CA
      vault stores root CA keys
      dynamic secrets - generates signed TLS keys
      no more tears
    
  
  mutual TLS

$ vault mount pki
$ vault write pki/root/generate/internal common_name=myvault.com ttl=87600h
...
...


  vault documents their security and threat models!
    https://www.vaultproject.io/docs/internals/security.html
  certificates with vault
    
      can be revoked
      never exposes your CA private keys
      can manage intermediate CAs
      secured via ACLs like anything else
      audited like anything else
    
  
Q&A


  do you support letsencrypt certs?
    
      just started planning this morning
    
  
  if I use vault with consul, can I use that consul for something
    else?
    
      in theory yes
      vault uses a subpath within consul, and encrypts everything
      we recommend you use an ACL to prevent access to other apps
        (it’s all encrypted garbage but if someone flips a bit you lose
        everything)
    
  
  what if there’s a rogue operator who’s keylogged?
    
      if you’re running it the right way it should be okay
      if the operator doesn’t have root on the machine running vault
        it should be okay
      if the operator has root then they can just coredump the vault
        process and get the key that way
      a rogue root user is not in our threat model
    
  
ignites

centos config mgmt sig, julien pivotto


  puppet chef ansible salt juju
  deeply linked with the OS, from the start, until EOL

where do you get tools?


  vendors, epel, make install, gems etc

YAG


  regular/commodity users -> EPEL
  (gap)
  advanced users

vendors packages


  where is the srpm?
  where is the buildchain?

we depend on those tools


  they have bugs

CentOS everything


  public build system
  everything needed to build software

SIG


  software interest group
  topic-focussed
  release RPMs
  can we make a CFGMGMT SIG?
  objectives:
    
      recent versions of cfg mgmt tools
    
  
a post-CM infrastructure delivery pipeline: @beddari


  we were using CM but not winning
  what we had built with love
    
      automated tests
      monitoring
    
  
  but it was a total failure
    
      non-managable rebuild times
      envs were starting to leak
    
  
  our systems are “eventually repeatable”
    
      darn it, test that change in prod
    
  
  docker docker docker docker
  solution: stop doing configuration management!
  artifacts and pipelines
  inputs are typically managed artifacts
  change
  feed input to packer which in turn runs a builder that applies
    change producing output
  output: a versioned artifact (rpm)
    
      repos, packages, images, containers
    
  
  abstraction is key to doing changes
  defns:
    
      a input-change-output chain is a project
      a project is versioned in git
      all artifacts are testable
    
  
  your new job is: describing state to produce artifacts and
    keeping that state from drifting
  http://nubis-docs.readthedocs.org/en/latest/MANIFESTO/
  change from stateful VMs to managing artifacts
    
      this worked really well
      packer with masterless puppet
      terraform and ansible
      masterless puppet to audit and correct drift
      yum upgrade considered harmful
    
  
SSHave your puppets! - Thomas Gelf


  own the tools you run
  replace a bunch of shell scripts
  puppet infrastructure setup sucks
    
      tried to scale PuppetDB or PE?
    
  
  puppet kick (deprecated)
  mcollective is moribund
  SSH - why not?
    
      centralized secrets
    
  
ssh <node> -c 'facter -j'
puppet master --compile > catalog
ssh <node> -c 'puppet apply'


  problem: file server
  shipping files with ssh?
    
      fix the catalog on the fly – change puppet:// urls
    
  
  problem: pluginsync
    
      custom facts
    
  
  reports, exported resources
    
      they just work
    
  
  one node per catalog - why?
    
      I want partial catalogs, different schedules, different user
        accounts
    
  
  tasks on demand
  testing
  docker docker

stop sharing “war stories” @felis_rex


  “war stories”
  “in the trenches”
  how is this so much of a thing?
  pop culture depictions of war v common
  mainstream media depictions of war
  yes, our jobs can be stressful
    
      we can be under pressure
      but it’s still not an actual war situation
    
  
  this is about culture
    
      we’re a corner of OSS community
      it’s not about making great code
      the way we talk to each other matters
      we should be inclusive
    
  
  alternatives to “war story”?
    
      anecdote
      story
      experience report
    
  
a pythonic approach to continuous delivery


  I have working python code, how do I start now?
  a proper deployment artifact:
    
      python package
        
          (debian package? docker?)
        
      
  it should be uniquely versioned
  it should manage dependencies
  https://github.com/blue-yonder/pyscaffold
  CI:
    
      run tests, build package
    
  
  push to artifact repository
  http://doc.devpi.net
  automated deploy: ansible
    
      we use virtualenvs to isolate dependencies
    
  
  pip doesn’t do true dependency resolution
  maintain and refactor your deployment
  pypa/pip issue #988 (pip needs a dependency resolver)
  OS package managers v pip: the two worlds should unite
  pip is still optimized for a manual workflow (eg no –yes option)
  you can build your own CD pipeline!

felt - samuel vandamme


  front-end load testing
  browser-based load testing tool with scenarios
  history:
    
      looked at Squish - selenium alternative
        
          uses chrome
        
      
      looked at phantomjs
        
          headless so it’s good
          missed some APIs
        
      
      looked at SlimerJS
        
          not headless but still performant
        
      
  use cases for felt
    
      quick load test
      FE apps (eg angularJS)
      simulate user
    
  
essential application management with tiny puppet @alvagante


  cfgmgmt not just about files
  but also files
  puppet module: example42/tp
  http://tiny-puppet.com/
  tp::conf { ‘nginx’: … }
  tp::conf { ‘nginx::example.com.conf’: … }
  tp::dir  { ‘nginx::www.example42.com: … }
  tp::test { ‘redis’: }
  tp::install { ‘redis’: …}

moving to puppet 4 at spotify

history:


  puppet 2.7
    
      dark, hacky features (eg dynamic scoping)
    
  
  puppet 3
    
      functional insanity with some pretty cool new tools and toys
      rspec-puppet, librarian-puppet etc came along
      we upped our game
    
  
  puppet 4
    
      language spec!
      type system!
      lambdas
      iterations
      all the things
      sanity
      ponies!
    
  
  as a module maintainer, it’s painful
    
      maintaining compatibility with 3 and 4 is frustrating
    
  
how we did it


  step 1: breathe
    
      talk it through on your team
    
  
  step 2: get to puppet 3.8 first
    
      the last 3.x release
      it starts throwing deprecation warnings at you
        
          fix these
          scoping, templates, etc
        
      
      upgrade your modules
        
          vox ppupli and puppetlabs modules Just Work™
        
      
  step 3: enable the future parser
    
      (don’t do this on puppet < 3.7.4)
      types of defaults will matter
        
          eg where default is empty string but you can pass in an
            array
          that won’t work any more
        
      
      clustered node data
        
          $facts hash
            
              unshadowable, unmodifiable
              unlike the old $operating_system fact lookup style
            
          
  step 4: upgrade to puppet 4
    
      two options:
        
          distro packages
          move to puppetlabs’s omnibus packages
        
      
      recommend using omnibus, but changes some things:
        
          /var/lib/puppet moves to /opt/puppetlabs/puppet
        
      
  step 5: caek

the actual upgrade


  two choices for the actual upgrade:
    
      spin up a new master
        
          point agents to the new master and babysit them one at a
            time
        
      
      pre-compile and compare catalogs
        
          tools that help:
            
              http://is.gd/mod_catalog_preview
              http://is.gd/catalog_diff
              http://is.gd/catalog_diff_viewer
            
          
          less common approach, but tends to go nicer
        
      
results


  we did it in:
    
      1-2 weeks of prep
      1 week of rollout
      2-3 days of cleanup
      0 production incidents
      (over 10k nodes)
    
  
  but.. we cheated
    
      we migrated to the future parser over a year ago :)
    
  
vox pupuli


  60 modules & tooling
  50 contributors
    
      basically everyone has commit to everything
    
  
  join the revolution!

Q&A


  bob: how many things did break when you enabled futureparser?
    
      weird scoping with templates
    
  
etcd, Jonathan Boulle @baronboulle

what is etcd?


  name: /etc distributed
  a clustered key-value store
    
      GET and SET ops
    
  
  a building block for higher order systems
    
      primitives for distributed systems
        
          distributed locks
          distributed scheduling
        
      
history of etcd


  2013.8: alpha (v0.x)
  2015.2: stable (v2.0+)
    
      stable replication engine (new Raft impl)
      stable v2 API
    
  
  2016.? (v3.0+)
    
      efficient, powerful API
        
          some operations we wanted to support couldn’t be done in the
            existing API
        
      
      highly scalable backend
      (ed: what does this mean?)
    
  
etcd today


  production ready

why did we build etcd?


  coreos mission: “secure the internet”
  updating servers = rebooting servers
  move towards app container paradigm
  need a:
    
      shared config store (for service discovery)
      distributed lock manager (to coordinate reboots)
    
  
  existing solutions were inflexible
    
      (zookeeper undocumented binary API – expected to use C bindings)
      difficult to configure
    
  
why use etcd?


  highly available
  highly reliable
  strong consistency
  simple, fast http API

how does it work?


  raft
    
      using a replicated log to model a state machine
      “In Search of an Understandable Consensus Algorithm” (Ongaro, 2014)
        
          response to paxos
          (zookeeper had its own consensus algorithm)
          raft is meant to be easier to understand and test
        
      
  three key concepts:
    
      leaders
      elections
      terms
    
  
  the cluster elects a leader for every term
  all log appends (…)
  implementation
    
      written in go, statically linked
      /bin/etcd
        
          daemon
          2379 (client requests/HTTP + JSON api)
          2380 (p2p/HTPP + protobuf)
        
      
      /bin/etcdctl
        
          CLI
        
      
      net/http, encoding/json
    
  
etcd cluster basics


  eg: have 5 nodes
    
      can lose 2
      lose 3, lose quorum -> cluster unavailable
    
  
  prefer odd-numbers for cluster sizes
  the more nodes you have, the more failures you can tolerate
    
      but the lower throughput becomes because every operation needs
        to hit a majority of nodes
    
  
simple HTTP API (v2)


  GET /v2/keys/foo
  GET /v2/keys/foo?wait=true
    
      poll for changes, receive notifications
    
  
  PUT /v2/keys/foo -d value=bar
  DELETE /v2/keys/foo
  PUT /v2/keys/foo?prevValue=bar -d value=ok
    
      atomic compare-and-swap
    
  
etcd applications


  locksmith
    
      cluster wide reboot lock - “semaphore for reboots”
      CoreOS updates happen automatically
        
          prevent all machines restarting at once
        
      
      set key: Sem=1
        
          take a ticket by CASing and decrementing the number
          release by CASing and incrementing
        
      
  flannel
    
      virtual overlay network
        
          provide a subnet to each host
          handle all routing
        
      
      uses etcd to store network configuration, allocated subnets, etc
    
  
  skydns
    
      service discovery and DNS server
      backed by etcd for all configuration and records
    
  
  vulcand
    
      “programmatic, extendable proxy for microservices”
      HTTP load balancer
      config in etcd
      (though actual proxied requests don’t touch etcd)
    
  
  confd
    
      simple config templating
      for “dumb” applications
      watch etcd for chagnes, render templates with new values,
        reload
      (sounds like consul-template mentioned in the vault talk?)
    
  
scaling etcd


  recent improvements (v2)
    
      asynchronous snapshotting
        
          append-only log-based system
            
              grows indefinitely
            
          
          snapshot, purge log
          safest: stop-the-world while you do this
            
              this is problematic because it blocks all writes
            
          
          now: in-memory copy, write copy to disk
            
              can continue serving while you purge the copy
            
          
      raft pipelining
        
          raft is based around a series of RPCs (eg AppendEntry)
          etcd previously used synchronous RPCs
          send next message only after receiving previous response
          now: optimistically send series of messages without waiting
            for replies
          (can these messages be reordered?)
        
      
  future improvements (v3)
    
      “scaling etcd to thousands of nodes”
      efficient and powerful API
        
          flat binary key space
          multi-object transaction
            
              extends CAS to allow conditions on multiple keys
            
          
          native leasing API
          native locking API
          gRPC (HTTP2 + protobuf)
            
              multiple streams sharing a single tcp connection
              compacted encoding format
            
          
      disk-backed storage
        
          historically: everything had to fit in memory
          keep cold historical data on disk
          keep hot data in memory
          support “entire history” watches
          user-facing compaction API
        
      
      incremental snapshots
        
          only save the delta instead of the full data set
          less I/O and CPU cost per snapshot
          no bursty resource usage, more stable performance
        
      
      upstream recipes for common usage patterns
        
          leases: attaching ownership to keys
          leader election
          locking resources
          client library to support these higher level use cases
        
      
War Games: flight training for devops, Jorge Salamero Sanz @bencerillo


  server density: monitoring
  the cost of uptime
  expect downtime
    
      prepare
      respond
      postmortem
    
  
  incident example:
    
      power failure to half our servers
        
          primary generator failed
          backup generator failed
          UPS failed
        
      
      automated failover unavailable
        
          (known failure condition)
        
      
      manual DNS switch required
      expected impact: 20 minutes
      actual impact: 43 minutes
    
  
  human factor
    
      unfamiliarity with process
      pressure of time sensitive event (panic)
      escalation introduces delays
    
  
  documented procedures
    
      checklists! ✓
      not to follow blindly – knowledge and experience still
        valuable
      independent system
      searchable
      list of known issues and documented workarounds/fixes
    
  
  checklists – why?
    
      humans have limitations
        
          memory and attention
          complexity
          stress and fatigue
          ego
        
      
      pilots, doctors, divers:
        
          Bruce Willis Ruins All Films
        
      
      checklists help humans
        
          increase confidency
          reduce panic
        
      
  realistic scenarios for your game day
    
      replica environment
      or mock command line
      record actions and timing
      multiple failures
      unexpected results
    
  
  simluartion goals
    
      team and individual test of response
      run real commands
      training the people
      training the procedures
      training the tools
    
  
  postmortem
    
      failure sucks
        
          but it happens, and we should recognize this
        
      
      fearless, blameless
      significant learning
      restores confidence
      increases credibility
      timing
        
          short regular updates
          even “we’re still looking in to it”
          ~1 week to publish full version
            
              follow-up incidents
              check with 3rd party providers
              timeline for required changes
            
          
      content
        
          root cause
          turn of events which led to failure
          steps to identify & isolate the cause
        
      
Empowering developers to deploy their own data stores with Terraform and puppet, @bobtfish


  http://www.slideshare.net/bobtfish/empowering-developers-to-deploy-their-own-data-stores
  https://github.com/bobtfish/AWSnycast
  puppet data in modules
    
      this is amazing. it changed our lives
      apply regex to hostname search1-reviews-uswest1aprod to parse
        out cluster name
      elasticsearch_cluster { 'reviews': }
      developers can create a new cluster by writing a yaml file
      pull the data out of the puppet hierarchy
      resuse the same YAML for service discovery and provisioning
    
  
  puppet ENC - external node classifier
    
      a script called by puppetmaster to generate node definition
      our ENC looks at AWS tags
        
          cluster name, role, etc
        
      
      puppet::role::elasticsearch_cluster => cluster_name = reviews
      stop needing individual hostnames!
        
          host naming schemes are evil!
            
              silly naming schemes (themed on planets)
              “sensible” naming schemes (based on descriptive role)
                
                  do you identify mysql master in hostname?
                  what happens when you failover?
                
              
      customize your monitoring system to actually tell you what’s wrong
        
          “the master db has crashed” v “a db has crashed”
        
      
  terraform has most of the pieces
    
      it’s awesome
        
          as long as you don’t use it like puppet
          roles/profiles => sadness
        
      
      treat it as a low level abstraction
      keep things in composable units
      add enough workflow to not run with scissors
      don’t put logic in your terraform code
      it’s a sharp tool
        
          can easily trash everything
        
      
      it’s the most generic abstraction possible
      map JSON (HCL) DSL => CRUD APIs
        
          it will do anything
          as a joke I wrote a softlayer terraform provider which used
            twilio to phone a man and request a server to be provisioned
        
      
      cannot do implicit mapping
        
          but puppet/ansible/whatever can?
          “Name” tag => namevar
          Only works in some cases - not everything has tags!
        
      
      implicit mapping is evil
        
          eg: puppet AWS
          in March 2014, I wanted to automate EC2 provisioning
            
              I could write a type and provider in puppet to generate VPCs
              @garethr stole it and it’s now puppet AWS
            
          
          BUG - prefetch method eats exceptions (fixed now)
            
              you ask AWS for all VPCs up front (in prefetch)
              if you throw an exception while prefetching, it was
                silently swallowed
              so it looks like there are no VPCs
              now you generate a whole bunch of duplicates
              workaround: an exception class with an overridden to_s
                method which would kill -9 itself
                
                  works, but not pretty
                
              
              I wouldn’t recommend puppet-aws unless you’re on puppet 4
                which fixed this bug
            
          
  terraform modules
    
      reusable abstraction (in theory)
      sharp edges abound if you have deep modules or complicated
        modules
        
          these are bugs and will be fixed
          you can’t treat terraform like puppet
        
      
      use modules, but don’t nest modules
        
          use version tags
          use other git repos
            
              split modules into git repos
            
          
  state
    
      why even is state?
      how do you cope with state?
        
          use hashicorp/atlas
            
              it will run terraform for you
              it solves these problem
            
          
          we.. reinvented atlas
            
              workflow (locking!) is your problem
              if two people run terraform concurrently, you’ll have a
                bad time
              state will diverge
              merging is not fun
            
          
      split state up by team
        
          search team owns search statefile
        
      
      S3 store
        
          many read, few write
        
      
      wrap it yourself (make, jenkins, etc)
        
          don’t install terraform in $PATH
            
              you don’t want people running terraform willy-nilly
            
          
  jenkins to own the workflow
    
      force people to generate a plan and okay it
        
          people aren’t evil, but they will take shortcuts
          if they can just run terraform apply without planning
            first, they will do so
          protecting me from myself
        
      
      “awsadmin” machine + IAM Role as slave
      Makefile based workflow
      jenkins job builder to template things
        
          you shouldn’t have shell scripts typed in to the jenkins
            text boxes
        
      
      split up the steps
        
          refresh state, and upload the refreshed state to S3
          plan + save as an artefact
          filter plan!
            
              things in AWS that terraform doesn’t know about
              lambda functions which tag instances based on who created
                them
              terraform doesn’t know these tags, so will remove them
              we filter this stuff out
            
          
          approve plan
          apply plan, save state
        
      
  nirvana
    
      self service cluster provisioning
        
          devs define their own clusters
          1 click from ops to approve
        
      
      owning team gets accounted
        
          aws metadata added as needed
          all metadata validated
        
      
      clusters built around best practices
        
          and when we update best practices, clusters get updated to match
        
      
      can abstract further in future
      opportunities to do clever things around accounting
        
          dev requested m4.xlarges, but we have m4.2xlarges as reserved instances
        
      
inspec: skynet testing, @arlimus


  slides: http://ow.ly/XPkvT
  InSpec: Infrastructure Specification
    
      v similar to server-spec
      started on top of the server-spec project
    
  
  code breaks
    
      normal accident theory
    
  
  why?
    
      reduce number of defects
      security and compliance testing
    
  
  test any target
    
      bare metal / VMs / containers
      linux / windows / other / legacy
    
  
  tiny howto
    
      install from rubygems, or clone git repo
      (see slides)
      test local node
      test remote via ssh
        
          (no ruby / agent on node)
        
      
      test remote via winRM (still no agent)
      test docker container
    
  
  example test

describe package('wget') do
  it { should be_installed }
end

describe file('/fetch-all.sh') do
  it { should be_file }
  its('owner') { should eq 'root' }
  its('mode') { should eq 0640 }
end
inspec exec dtest.rb -t docker://f02e

  run via test-kitchen
    
      kitchen-inspec
    
  
  demo: solaris box running within test kitchen (on vagrant)
    
      test-kitchen verify normally takes a long time because it
        installs a bunch of stuff on the box
      much faster to verify with inspec
      solaris test:
    
  
describe os do
  it { should be_solaris }
end

describe package('network/ssh') do
  it { should be_installed }
end

describe service('/network/ssh') do
  it { should be_running }
end
being expressive

describe file('/etc/ssh/sshd_config') do
  its(:content) { should match /Protocol 2/ }
end
this regex is brittle. comment? prefix/suffix?
Better:
describe sshd_config do
  its('Protocol') { should cmp 2 }
end
custom resources help with this.
Configuration Management vs. Container Automation, @johscheuer and Arnold Bechtoldt


  “containers suck too”
    
      “docker security is a mess!”
        
          physical separation?
        
      
      “images on docuker hub are insecure!”
        
          just community contributions
            
              lots of docker images contain bash with shellshock
                vulnerability
            
          
          docker images are artefacks, treat them like
            vmdk/vhd/vdi/deb/rpm
          build your own lightweight base images
          use base images without lots of userland tools if possible (eg
            alpine linux)
        
      
      dockerfile is “return of the bash”
        
          over-engineered dockerfiles
          replace large shell scripts with CM running outside the container
          what we want is configuration management with a smaller footprint
          avoid requiring ruby/python/etc inside the container just to
            get your CM tool running
        
      
      scheduling/orchestration is a whole new area
    
  
  http://rexify.org - a perl-based CM tool
  “it doesn’t matter how many resources you have, if you don’t know
    how to use them, it will never be enough”
  use cases
    
      continuous integration & delivery
    
  
automating AIX with chef, Julian Dunn


  AIX was first released in 1987 (?)
  I first came in as an engineering manager, but I knew nothing
    about this platform
  some quirks, some pains
  Test Kitchen support – rough and unreleased
  traditional management of AIX:
    
      manually
      SMIT - menu-driven config tool
      transforming old-school shops has two routes:
        
          migrate AIX to linux, then automate with chef
          manage AIX with chef, then migrate to linux
            
              second route is easier as it abstracts away the OS so
                there’s less to learn at each step
            
          
  challenges
    
      lack of familiar with platform, hardware, setup
        
          hypervisor-based but all in hardware
        
      
      XLC - proprietary compiler
        
          if you use XLC, output is guaranteed forward-compatible
            forever
          binaries from 1989 still run on AIX today
        
      
      can’t use GNU-isms
        
          no bash, it’s korn shell
          no less!
        
      
      no real package manager
        
          bff
        
      
      you can use rpm
        
          but no yum or anything
        
      
      two init systems (init and SRC)
        
          key features which are missing!
        
      
      virtualization features
        
          sometimes cool, sometimes not
        
      
  platform quirks & features
    
      all core chef resources work out of the box on AIX
      special resources in core
        
          bff_package
          service - need to specify init or SRC, some actions don’t work
        
      
      more specific AIX resources in aix library cookbook
        
          manage inittab etc
        
      
  chef’s installer is sh-compatible which is necessary for AIX
    
      the file at https://www.chef.io/chef/install.sh doesn’t use bash-isms
      starts with this comment:
    
  
# WARNING: REQUIRES /bin/sh
#
# - must run on /bin/sh on solaris 9
# - must run on /bin/sh on AIX 6.x
#

  future work
    
      other POWER platform support
      chef server on POWER
      chef client for linux on System/z
    
  
  links
    
      https://supermarket.chef.io/cookbooks/aix
    
  
rkt and Kubernetes: What’s new with Container Runtimes and Orchestration, Jonathan Boulle @baronboulle


  appc pods ≅ kubernetes pods
  rkt
    
      simple cli tool
      no (mandatory) daemon
        
          big daemon running as root feels like not the best default
            setup
        
      
      no (mandatory) API
      bash/systemd/kubelet -> rkt run -> application(s)
    
  
  stage0 (rkt binary)
    
      primary interface to rkt
      discover, fetch, manage app images
      set up pod filesystems
      manage pod lifecycle
        
          rkt run
          rkt image list
        
      
  stage1 (swappable execution engines)
    
      default impl
        
          systemd-nspawn+systemd
          linux namespaces + cgroups
        
      
      kvm impl
        
          based on lkvm+systemd
          hw virtualization for isolation
        
      
      others?
    
  
  rkt TPM measurement
    
      used to “measure” system state
      historically just use to verify bootloader/OS
      CoreOS added support to GRUB
      rkt can now record information about running pods in the TPM
      tamper-proof audit log
    
  
  rkt API service
    
      optional gRPC-based API daemon
      exposes information on pods and image
      runs as unprivileged user
      read-only
      easier integration
    
  
  recap: why rkt?
    
      secure, standards, composable
    
  
  rktnetes
    
      using rkt as the kubelet’s container runtime
      a pod-native runtime
      first-class integration with systemd hosts
      self-contained pods process model -> no SPOF
      multiple-image compatibility (eg docker 2aci)
      transparently swappable container engines
    
  
  possible topologies
    
      kubelet -> systemd -> rkt pod
      could remove systemd and run pod directly on kubelet (kubelet ->
        rkt pod)
    
  
  using rkt to run kubernetes
    
      kubernetes components are largely self-hosting, but not entirely
      need a way to bootstrap kubelet on the host
      on coreos, this means in a container (because that’s the only
        way to run things on coreos)..
      ..but kubelet has some unique reuirements
        
          like mounting volumes on the host
        
      
      rkt “fly” feature (new in rkt 0.15.0)
        
          unlike rkt run, doesn’t run in pod; uncontained
          has full access to host mount (and pid..) namespace
        
      
  rkt networking
    
      plugin-based
      IP(s) per pod
      container networking interface (CNI)
      CNI was just another plugin type, but soon to be the kubernetes plugin model
      http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html
        
          aside: use letsencrypt, please, blog.kubernetes.io!
        
      
  future
    
      rkt v1.0.0
        
          soon….
        
      
      rktnetes 1.0 2016Q1
        
          fully supported, full feature parity, automated testing on coreos
        
      
      rktnetes 1.0+
        
          lkvm backend by default
          native support for ACIs
          tectonic trusted computing
          https://coreos.com/blog/coreos-trusted-computing.html
        
      
      kubelet upgrades
        
          mixed-version clusters don’t always work
          (eg api from 1.0.7 to 1.1.1:
            https://coreos.com/blog/coreos-trusted-computing.html )
          solution: API-driven upgrades
        
      
  summary
    
      use rkt
      use kubernetes
      get involved and help define future of app containers
    
  
Q&A


  does use of KVM mean non-linux hosts can be run inside?
    
      currently no
    
  
  image format for a registry?
    
      we don’t have a centralized registry
      we want to get away from that model
    
  
  can rkt run docker images
    
      yes
      the current kubernetes api only accepts docker images so it’s
        the only thing it can run
    
  
  what do I have to actually do to use rkt?
    
      there’s a using rkt with kubernetes guide