rjhintz/disposable_infrastructure.md

## disposable_infrastructure.md

      
    Raw
  

              disposable_infrastructure.md
            
          
    #Disposable Infrastructure
Draft

Note: Comments welcome, especially opposing views or pointers to faulty logic.
##Overview
This discusses "disposable infrastructure."
##What is Disposable Infrastructure?
Disposable infrastructure are infrastructure components that are deleted from the configuration when they fail or have degraded past a boundary condition. The state of the resource may not be retained, even if it might be necesssary or helpful.
One observation is that adopting immutable infrastructure practices moves the boundary between versioned and unversioned closer to the lower end of the system/stack.
Disposable infrastructure can be compared and contrasted with an application's disposability, one of the characteristics of the 12 factor app. Disposability for an application means it's graceful during planned shutdown or unplanned, sudden death.
###Can All Infrastructure Components Be Immutable?
Which infrastructure Components can, in principle, be immutable? Is there such as thing as stateless infrastructure?
As a practical matter, which infrastructure components can be immutable?
Which of these should be immutable? Are there some overriding dollar or wall clock costs or operational considerations associated with making components some immutable? Which components, by class, are at the margin?
###Scope
What about components that are controlled by others?
What about components at the boundary where control is by others?
##Why is Disposable Infrastructure Important?
What does it enable? What does it mitigate?
##Who Creates, Modifies, Uses, Terminates It?
Disposable infrastructure is:

created by operations, sometimes by development, in response to design requirments.
modified by operations, rarely (?) by development, in response to work order
used by operations on behalf of end users
terminated by operations in response to policy

##How is it Created, Modified, Used, Terminated?
Disposable infrastructure methods include:

for creation,scripts or more formal automation such as configuration management tools following design or policy guidelines applying current versioning
for updating, scripts or more formal automation such as configuration management tools following design or policy guidelines applying new versioning. (Note: the end infrastructure products aren't updated. The tools affecting them are updated.)
for using,components are typically used QA, and production systems. They can be used in development systems.
for terminating, obsolete resource are terminated by policy or, if orphaned, manually when identified

##Issues

How important is state?
What do you know about infra that is outside your control? (See other remarks about scope.)
Is it an unattainable myth?
Generally immutable infra refers to server type instances, what's the applicability, if any, to data stores, that is, those components handling persistent data?

Does distributed, "self healing" storage constructs such as Riak or Cassandra mitigate this issue? Mike Fiedler has a demo of killing MongoDB primary without change to application business metric (around 16:30 of his talk)


Special case of queues: in a distributed environment a queue service may deliver duplicate messages or reordered messages when network partitions/outages occur. In some cases, messages may be lost. Example, RabbitMQ
Do patterns for development of versions, automation, and deployment need to be developed and settled on before production use?
Are auto acceptance tests available to be run on the results of the automation process, especially for new versions?
Facilitates self service, but how to enforce standards?
What is the garbage collection process? How to get rid of tech debt in image.

###Is State Siloed?
Florian Motlik has written:

The main criticism against immutable infrastructure – as stated in the Chef blog post – is that there is always state somewhere in the system and, therefore, the whole system isn’t immutable. That misses the point of immutable components. The main advantage when it comes to state in immutable infrastructure is that it is siloed. The boundaries between layers storing state and the layers that are ephemeral are clearly drawn and no leakage can possibly happen between those layers. There simply is no way to mix state into different components when you can’t expect them to be up and running the next minute.

Is this true for components that handle persistent storage? How about transactions in flight from ephemeral instances to persistent storage? Update: Many discussions carve out data store handling separately or recommend the use of distributed, resilient databases. See elsewhere in this document.
##Advantages
Advantages include:

reduce dependence on instance stability
reduces technical debt associated with undocumented changes
may mitigate issue of "it worked on my laptop"
may help with "DevOops" Kief Morris
with a "narrower interface" better guarantees may be possible

##Disadvantages
Disadvantages include:

unclear what happens to required or useful information in state of stateful resources that are marked for disposal
almost requires modern source control/versioning system
need a general automation policy, including tool selection, training, and transition
what happens if a resource degrades, but not over boundary level. Then another. Then several/many, so the aggregate is not resouce failure, but unacceptable degradation
easy to cause a problem with a wide dispersion quickly
"introduces dependencies to third parties during deployment. If you install packages in the system and your app repository is slow or down this can fail the deployment." Florian Motlik

##Barriers to Adoption
Barriers to adoption include:

how many and what versions to adopt?
how to perform configuration?
what co-existence of versioning can be tolerated?
is there an undocumented security or compliance patch in a running version that won't be carried forward to the replacement version?
goes against human nature since it's a lot easier to make a small, probably poorly documented change on a single server
may be viewed as more risky by management since the transition makes it seem like "Version 1 of Everything"

##Contrary Views

Eric Windisch, Things will Change - Usenix Keynote UCMS'14, slides 20 and following

"It is naive to think we can simply throw away VMs or containers — we want to preserve their state for archival and analysis."
"The biggest problem with blind adherence to immutable infrastructure & 12-factor… is ignorance of the importance of the implicit state of a system which should not be deemed disposable."


See also video in References.

##References

Lessons from the Cloud Bunker, Subbu Allamaraju
Cloudcast podcast 213 - What is Immutable Infrastructure?
A Brief Look at Immutable Infrastructure and Why it is Such a Quest, Adron Hall
The 12 Factor App - Disposability
Disposable Components: Chad Fowler @ Rails Israel 2013 - video

Chad Fowler
Standish Chaos Report - PDF

contrary view The Non-Existent Software Crisis: Debunking the Chaos Report

(comment: redefines success using fewer than the usual, so seems suspect)


software systems are hard to change
tightly coupled systems are the default
deployment is scary
your own application deployment
deploying new versions, releases of supporting software such as language, framework, or database versions


if you don't upgrade supporting software because of fear, you can't use new functions
sometimes afraid to changebecause can't find there all the components are
8:00 logic is buried in abstractions or layers of abstractions
9:20 homeostasis: body regulation mechanism
11:20 proposed solution: mimic living organism by replacing small sized components like body cells

Fred George on microservices
services like body cells


reduce coupling
remove fear of deployment
reduce entropy
make code change easier
facilitate "go fast"
comments are a design smell
unit tests are a design smell, if services are sufficiently small

(comment: but what about refactoring?


systems are heterogeneous by default

java + python + ruby


code should be "this big," meaning bite sized

can be rewritten in another language
can be understood and modified while the author is on holiday


nodes/instances are disposable

instance failure can lead to higher load. How does the highest level service maintain in a degraded state?


Can a long running node be recreated?

often not since it's been patched


possible practice: don't update once booted

defeat login?


provisioning must be trivial
"Always Be Deploying"
assume failure
monitor everything

favor measurement over testing


between MTBF vs MTTR, prefer MTTR

practice fixing, since failure in inevitable


monitor business metrics

example: new signups dipping, though no crash, it's an important business metric that shows a problem exists that needs to be fixed


practice worst case scenarios in advance
canary deployments

slowly deploy new instances, monitor, possibly rollback


service resolution using Zookeeper (?)
pub/sub (?)
favor synchronous action; async in fallback (?)
bind by contract - call a web service and use it if it matches requirements
what if route files read like strongly type functional pattern matching with deconstructing supported federated, heterogeneous services (?)

intelligently route to multiple backend services


services own and encapsulate data - small databases
Problems

possibly too slow
each dev needs an environment (?)
no referential integrity, but you may not have it anyway
no unified view of the system


Trash Your Servers & Burn Your Code: Immutable Infrastructure & Disposable Components - video

Chad Fowler
34:00 Pinterest uses AWS spot instances to run production
Service resolution

started with Zookeeper, now Consul
possibly use a Pub/Sub concept?
use synchronous, since it's simpler; asynch as a fallback


36:15 Service By Contract: call a method not by name, buy by what it does
37:00 document interfaces: example, JSON schema
38:30 global asynch validation middleware

every message is asynch validated, then pushed into monitoring
get alerts for oddball constructs


smaller databases: no foreign key constraints
42:00 it's ok to run more than one version of a service at a time
you can only deploy one service at a time
43:00 XML & "must ignore"
44:45 small systems can lead to small requests, so things aren't necessarily slow
easy to figure out where bottlenecks are


Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components - 2013
Thing Will Change - Eric Windisch UCMS'14
InfoQ - Managing Change with Immutable Servers
Food Fight video 2013 - Immutable Infrastructure

Chad Fowler, started with Chef, went to Capistrano, now home brew using Ruby classes
approx 10min to approx 2min to deploy
10-20 deploys/day
approx 14:45, for databases, using Redis and other tech. Not easily disposable. For DB plans to use RDS, managed by others, to get around state issue
around 18:00 Ranjib Dey from Pagerduty talked about using Cassandra/Riak, though needs coordination
25:30 what if there's an outage?
35:00 Craig Tracey then at Hubspot, 1,700 instances
manual coordination of master/slave
uses Puppet, 15min to provision


Why You Should Build an Immutable Infrastructure
Immutable Infrastructure: Practical or Not?

but see (https://twitter.com/chadfowler/status/483707212301631488)


Immutable Infrastructure Hangout on Air

around 4:05, what is it (immutable infra)?
can everything be immutable?
moving the boundary between versioned and unversioned to lower end of the system (stack)
7:35 what do you know about any infra outside your scope?
10:00 Is it a myth?
11:00 immutable as a characteristic generally applied to server like objects, but what about data?
22:50 need pattern(s) for deployment
24:50 deals with the "it worked on my laptop" issue
25:40 partial solution for "DevOops"
easy to screwup a lot, quickly
27:30 orders of magnitude faster to make a small change manually in a VM (presumably lightly documented)
28:00 humans generally choose the easier path. Discussion that containers may help with this.
32:45 Is immutable infra more risky? May make harder to sell to management.
36:00 Is this Version 1 of Everything?
36:30 need auto acceptance test of product of the process
37:30 self service + maintain standards
serverspec http://serverspec.org/
42:15 with a narrower interface, can you provide better guarantees?


Treating Your Infrastructure Like Garbage – Mike Fiedler
ImmutableServer - Kief Morris
Netflix aminator tool for creating EBS AMIs
Immutable Servers With Packer and Puppet - James Carr

Using Packer and Puppet


Packer - tool for creating machine and container images for multiple platforms from a single source configuration
Distributed Logging

Scribe
Flume