#Disposable Infrastructure
Note: Comments welcome, especially opposing views or pointers to faulty logic.
##Overview This discusses "disposable infrastructure." ##What is Disposable Infrastructure? Disposable infrastructure are infrastructure components that are deleted from the configuration when they fail or have degraded past a boundary condition. The state of the resource may not be retained, even if it might be necesssary or helpful.
One observation is that adopting immutable infrastructure practices moves the boundary between versioned and unversioned closer to the lower end of the system/stack.
Disposable infrastructure can be compared and contrasted with an application's disposability, one of the characteristics of the 12 factor app. Disposability for an application means it's graceful during planned shutdown or unplanned, sudden death.
###Can All Infrastructure Components Be Immutable? Which infrastructure Components can, in principle, be immutable? Is there such as thing as stateless infrastructure?
As a practical matter, which infrastructure components can be immutable?
Which of these should be immutable? Are there some overriding dollar or wall clock costs or operational considerations associated with making components some immutable? Which components, by class, are at the margin? ###Scope What about components that are controlled by others?
What about components at the boundary where control is by others? ##Why is Disposable Infrastructure Important? What does it enable? What does it mitigate? ##Who Creates, Modifies, Uses, Terminates It? Disposable infrastructure is:
- created by operations, sometimes by development, in response to design requirments.
- modified by operations, rarely (?) by development, in response to work order
- used by operations on behalf of end users
- terminated by operations in response to policy
##How is it Created, Modified, Used, Terminated? Disposable infrastructure methods include:
- for creation,scripts or more formal automation such as configuration management tools following design or policy guidelines applying current versioning
- for updating, scripts or more formal automation such as configuration management tools following design or policy guidelines applying new versioning. (Note: the end infrastructure products aren't updated. The tools affecting them are updated.)
- for using,components are typically used QA, and production systems. They can be used in development systems.
- for terminating, obsolete resource are terminated by policy or, if orphaned, manually when identified
##Issues
- How important is state?
- What do you know about infra that is outside your control? (See other remarks about scope.)
- Is it an unattainable myth?
- Generally immutable infra refers to server type instances, what's the applicability, if any, to data stores, that is, those components handling persistent data?
- Does distributed, "self healing" storage constructs such as Riak or Cassandra mitigate this issue? Mike Fiedler has a demo of killing MongoDB primary without change to application business metric (around 16:30 of his talk)
- Special case of queues: in a distributed environment a queue service may deliver duplicate messages or reordered messages when network partitions/outages occur. In some cases, messages may be lost. Example, RabbitMQ
- Do patterns for development of versions, automation, and deployment need to be developed and settled on before production use?
- Are auto acceptance tests available to be run on the results of the automation process, especially for new versions?
- Facilitates self service, but how to enforce standards?
- What is the garbage collection process? How to get rid of tech debt in image.
###Is State Siloed? Florian Motlik has written:
The main criticism against immutable infrastructure – as stated in the Chef blog post – is that there is always state somewhere in the system and, therefore, the whole system isn’t immutable. That misses the point of immutable components. The main advantage when it comes to state in immutable infrastructure is that it is siloed. The boundaries between layers storing state and the layers that are ephemeral are clearly drawn and no leakage can possibly happen between those layers. There simply is no way to mix state into different components when you can’t expect them to be up and running the next minute.
Is this true for components that handle persistent storage? How about transactions in flight from ephemeral instances to persistent storage? Update: Many discussions carve out data store handling separately or recommend the use of distributed, resilient databases. See elsewhere in this document.
##Advantages Advantages include:
- reduce dependence on instance stability
- reduces technical debt associated with undocumented changes
- may mitigate issue of "it worked on my laptop"
- may help with "DevOops" Kief Morris
- with a "narrower interface" better guarantees may be possible
##Disadvantages Disadvantages include:
- unclear what happens to required or useful information in state of stateful resources that are marked for disposal
- almost requires modern source control/versioning system
- need a general automation policy, including tool selection, training, and transition
- what happens if a resource degrades, but not over boundary level. Then another. Then several/many, so the aggregate is not resouce failure, but unacceptable degradation
- easy to cause a problem with a wide dispersion quickly
- "introduces dependencies to third parties during deployment. If you install packages in the system and your app repository is slow or down this can fail the deployment." Florian Motlik
##Barriers to Adoption Barriers to adoption include:
- how many and what versions to adopt?
- how to perform configuration?
- what co-existence of versioning can be tolerated?
- is there an undocumented security or compliance patch in a running version that won't be carried forward to the replacement version?
- goes against human nature since it's a lot easier to make a small, probably poorly documented change on a single server
- may be viewed as more risky by management since the transition makes it seem like "Version 1 of Everything"
##Contrary Views
- Eric Windisch, Things will Change - Usenix Keynote UCMS'14, slides 20 and following
- "It is naive to think we can simply throw away VMs or containers — we want to preserve their state for archival and analysis."
- "The biggest problem with blind adherence to immutable infrastructure & 12-factor… is ignorance of the importance of the implicit state of a system which should not be deemed disposable."
See also video in References.
##References
- Lessons from the Cloud Bunker, Subbu Allamaraju
- Cloudcast podcast 213 - What is Immutable Infrastructure?
- A Brief Look at Immutable Infrastructure and Why it is Such a Quest, Adron Hall
- The 12 Factor App - Disposability
- Disposable Components: Chad Fowler @ Rails Israel 2013 - video
- Chad Fowler
- Standish Chaos Report - PDF
- contrary view The Non-Existent Software Crisis: Debunking the Chaos Report
- (comment: redefines success using fewer than the usual, so seems suspect)
- contrary view The Non-Existent Software Crisis: Debunking the Chaos Report
- software systems are hard to change
- tightly coupled systems are the default
- deployment is scary
- your own application deployment
- deploying new versions, releases of supporting software such as language, framework, or database versions
- if you don't upgrade supporting software because of fear, you can't use new functions
- sometimes afraid to changebecause can't find there all the components are
- 8:00 logic is buried in abstractions or layers of abstractions
- 9:20 homeostasis: body regulation mechanism
- 11:20 proposed solution: mimic living organism by replacing small sized components like body cells
- Fred George on microservices
- services like body cells
- reduce coupling
- remove fear of deployment
- reduce entropy
- make code change easier
- facilitate "go fast"
- comments are a design smell
- unit tests are a design smell, if services are sufficiently small
- (comment: but what about refactoring?
- systems are heterogeneous by default
- java + python + ruby
- code should be "this big," meaning bite sized
- can be rewritten in another language
- can be understood and modified while the author is on holiday
- nodes/instances are disposable
- instance failure can lead to higher load. How does the highest level service maintain in a degraded state?
- Can a long running node be recreated?
- often not since it's been patched
- possible practice: don't update once booted
- defeat login?
- provisioning must be trivial
- "Always Be Deploying"
- assume failure
- monitor everything
- favor measurement over testing
- between MTBF vs MTTR, prefer MTTR
- practice fixing, since failure in inevitable
- monitor business metrics
- example: new signups dipping, though no crash, it's an important business metric that shows a problem exists that needs to be fixed
- practice worst case scenarios in advance
- canary deployments
- slowly deploy new instances, monitor, possibly rollback
- service resolution using Zookeeper (?)
- pub/sub (?)
- favor synchronous action; async in fallback (?)
- bind by contract - call a web service and use it if it matches requirements
- what if route files read like strongly type functional pattern matching with deconstructing supported federated, heterogeneous services (?)
- intelligently route to multiple backend services
- services own and encapsulate data - small databases
- Problems
- possibly too slow
- each dev needs an environment (?)
- no referential integrity, but you may not have it anyway
- no unified view of the system
- Trash Your Servers & Burn Your Code: Immutable Infrastructure & Disposable Components - video
- Chad Fowler
- 34:00 Pinterest uses AWS spot instances to run production
- Service resolution
- started with Zookeeper, now Consul
- possibly use a Pub/Sub concept?
- use synchronous, since it's simpler; asynch as a fallback
- 36:15 Service By Contract: call a method not by name, buy by what it does
- 37:00 document interfaces: example, JSON schema
- 38:30 global asynch validation middleware
- every message is asynch validated, then pushed into monitoring
- get alerts for oddball constructs
- smaller databases: no foreign key constraints
- 42:00 it's ok to run more than one version of a service at a time
- you can only deploy one service at a time
- 43:00 XML & "must ignore"
- 44:45 small systems can lead to small requests, so things aren't necessarily slow
- easy to figure out where bottlenecks are
- Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components - 2013
- Thing Will Change - Eric Windisch UCMS'14
- InfoQ - Managing Change with Immutable Servers
- Food Fight video 2013 - Immutable Infrastructure
- Chad Fowler, started with Chef, went to Capistrano, now home brew using Ruby classes
- approx 10min to approx 2min to deploy
- 10-20 deploys/day
- approx 14:45, for databases, using Redis and other tech. Not easily disposable. For DB plans to use RDS, managed by others, to get around state issue
- around 18:00 Ranjib Dey from Pagerduty talked about using Cassandra/Riak, though needs coordination
- 25:30 what if there's an outage?
- 35:00 Craig Tracey then at Hubspot, 1,700 instances
- manual coordination of master/slave
- uses Puppet, 15min to provision
- Why You Should Build an Immutable Infrastructure
- Immutable Infrastructure: Practical or Not?
- Immutable Infrastructure Hangout on Air
- around 4:05, what is it (immutable infra)?
- can everything be immutable?
- moving the boundary between versioned and unversioned to lower end of the system (stack)
- 7:35 what do you know about any infra outside your scope?
- 10:00 Is it a myth?
- 11:00 immutable as a characteristic generally applied to server like objects, but what about data?
- 22:50 need pattern(s) for deployment
- 24:50 deals with the "it worked on my laptop" issue
- 25:40 partial solution for "DevOops"
- easy to screwup a lot, quickly
- 27:30 orders of magnitude faster to make a small change manually in a VM (presumably lightly documented)
- 28:00 humans generally choose the easier path. Discussion that containers may help with this.
- 32:45 Is immutable infra more risky? May make harder to sell to management.
- 36:00 Is this Version 1 of Everything?
- 36:30 need auto acceptance test of product of the process
- 37:30 self service + maintain standards
- serverspec http://serverspec.org/
- 42:15 with a narrower interface, can you provide better guarantees?
- Treating Your Infrastructure Like Garbage – Mike Fiedler
- ImmutableServer - Kief Morris
- Netflix aminator tool for creating EBS AMIs
- Immutable Servers With Packer and Puppet - James Carr
- Using Packer and Puppet
- Packer - tool for creating machine and container images for multiple platforms from a single source configuration
- Distributed Logging