Skip to content

Instantly share code, notes, and snippets.

@mwatts15
Created November 21, 2017 03:08
Show Gist options
  • Save mwatts15/c6b0d827bdbaeb2bc8b0e4e52e8f60b8 to your computer and use it in GitHub Desktop.
Save mwatts15/c6b0d827bdbaeb2bc8b0e4e52e8f60b8 to your computer and use it in GitHub Desktop.
ADDO '17 notes
Mark Watts' notes on All Day DevOps.
====================================
The talks I viewed and the notes I took reflect my personal interests and
issues which I though would be worthwile to my employer, but there were many
useful talks.
One of the business-side advantages of DevOps is that it's supposed to *save
time and money* by reducing overhead due, principally, to dysfunctional
patterns of behavior within the organization. Many talks in ADDO describe tools
that are intended to solve common problems in software development and
deployment. In addition, many presenters describe processes of organizational
transformation which give insight into possible difficulties and ways to get
through or avoid them.
As I see it, continuous integration (CI) and continous deployment (CD) are
*consequences* of an organization that has built up the habits of communication
between groups, which habits we label DevOps. Without communication that
happens early and often, we end up in situations where software or
infrastructure or IT use 'DevOps tools' but they don't realize the full
potential of those tools to add value and deliver more rapidly. There are notes
for several tools- and techniques-focused presentations below, but the
importance isn't the tools themselves but the models of product development,
deployment, and maintenance they enable through their use.
---
Thanks to the ADDO organizers, presenters, and sponsors!
NOTE: ADDO has clipped the talks so you don't have to scrub through looking for
a talk like I had to, but the links for the originally recorded blocks are used
below since I didn't have the clips at the time. A patch with the updated links
is welcome, but I likely will not make the effort since you can find each talk
by its title or presenter name pretty easily.
Keynote, Jaya Baloo - 3:00 AM, EST
3:00 AM, EST
Derek Weeks, Mark Miller, Co-Founders, All Day DevOps
Jaya Baloo, Chief Information Security Officer, KPN Telecom
- This one was mostly about quantum computing
- The DevSecOps tie-in was mostly at the end and had to do with
anticipating upcoming changes in computing and putting in place defensive
security infrastructure and thinking about secrecy of cipher texts that
have already been collected. That's a cross-functional concern, so it
includes Dev, Ops, and Sec
https://www.youtube.com/watch?v=MnevoY_ACD4
https://www.youtube.com/watch?v=-JosVWcYUsI
DevSecOps and the DevOps Superpattern - Helen Beal
- Beal @ Ranger4 - a DevOps consulting company
- History of DevOps
- Patrick DuBois @ Google
- Paul Hammond & John Osborne @ Flickr
- Getting Ops / Management more Agile
- Newer definition: value stream development
- CAMS an acronym to describe what DevOps is
Culture
Automation
Measurement
Sharing
- "Is DevSecOps a Good Thing?" talk (answer: "yes")
- slide: "Is security an afterthought?" (answer: "also yes")
- Parts Unlimited Team - simulated software development org
Based on "The Phoenix Project" book by Gene Kim, Kevin Behr, and George
Spafford
(https://www.amazon.com/Phoenix-Project-DevOps-Helping-Business/dp/0988262509?SubscriptionId=AKIAILSHYYTFIVPWUY6Q&tag=duckduckgo-ffsb-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=0988262509)
- Presents some anti-patterns in communications with security team
ME: seems pretty heavy on things that sec needs to do to 'catch up' vs
things other groups could do to work better with dev (e.g,, designing
pipelines in such a way that sec is empowered to not just perform scans,
but also make and deploy patches)
ME: Should point out that the slide is in the same language as the Agile
Manifesto values statement (e.g., "Working software over comprehensive
documentation"),
- Everyone has responsibility for security
- The DevOps "Super-pattern" (slide 12): https://youtu.be/MnevoY_ACD4?t=701
- DevOps comes from the concepts
- Holacracy (portmanteau of "holistic" and "democracy")
- ITSM
- Agile
- Lean
- Three-legged stool (theory of constraints)
- Learning organization
- Safety culture
- Harmonious Polygamist Marriage
- Table looking at some of the concepts above through CAMS "lenses"
- Agile Daily collab *including with sec*
- Holacracy - A flat organizational structure
- Don't shoot, but reward the messenger
- Don't isolate sec
- Agile Service Management (ASM)
- ITIL (IT infrastructure library) processes and procedures
- Delivering IT through Agile methods
- Just enough governance to deliver value
- Sharing vocabularies and intelligence
- Lean
- Automation can help resolve the security skills gap
ME: This slide could be greatly condensed, at least to the points which
are highlighted as Beal moves through it. Splitting out into CAMS
aspects doesn't help me to understand how security can work with other
groups. A general description of the disciplines would suffice for an
orientation which could then be coupled with the non-obvious CAMS
alignment would suffice for laying out the 'superpattern'
https://www.youtube.com/watch?v=2812-6gdbyA
Understand Immutable Infrastructure: What? Why? How? - Quentin Adam
- This one was hard to follow at times because of the presentation style,
but it seems that the Adam's company looked at the problems that
typically cause failures in real architectures and determined that the
biggest ones come from human interaction with the system, and in
particular, reconfiguration at runtime.
- At Clever Cloud, apparently, an SSH login to a server is a 'red alert'
- Their methodolgy involves deploying new instances to change configuration
rather than changing the configuration on a running server. According to
Adam, this means that the state space for a server is reduced from N
variables (e.g., the space of versions of all peices of software on a
system) to 2 (working / not working). I suppose that using this
methodology would reduce the incidence of 'fat-fingering' a command so
that you accidentally take a whole cluster offline.
https://www.youtube.com/watch?v=6Rf5ChCSoBs
But We Can't Do That Here! - Liz Keogh
- I liked this talk a lot.
- Main things I took from it:
- Cynefin : a framework for thinking about domains of problem-solving /
decision making
- Chaotic (act, sense, respond)
- Complex (probe, sense, respond)
- Complicated (sense, analyze, respond)
- Obvious (sense, categorize, respond)
Knowing which domain you're in is key to not making expensive mistakes.
- Utility of value-stream mapping, even for teams whose primary
'customer' is within the organization
- Related one story about how infrastructure team identified 50
steps to acquiring a server, then found the points where the
process was disrupted, then focused on those steps.
https://www.youtube.com/watch?v=IAcUalc5_d0
Increasing the Dependability of DevOps Processes - Ingo Weber
- Describes a framework for alerting on anomalous occurences in a cloud
deployment environment
- Not gov't-focused, but Weber is part of an org that is "technically"
governmental
- Like Quentin Adam's presentation, indicates that most failures are the
result of human error, but actually cites a study to back that up.
- Framework involves creating models for various actions performed on the
system. Main example is a rolling deployment of a new release.
- Based on log events and polling of cloud APIs
- Overall approach is called POD or Process-Oriented Dependability
- Alerts based on unexpected state transitions indicated by cloud API
results and observed log messages (e.g., restarting some group of servers
and not all servers in the group have a 'shutdown' and a 'startup'
message before a 'complete' message for the restart command)
- Offline training for the models ME: Why not online? In general, the
behavior observed in production can be different in terms of response
times and nominal log volumes. Although the log events described are
probably conserved between environments, Weber leaves out a whole class
of other events with only offline training.
- Describes some timing-dependent alerts. ME: The example shows what look
like 4 different modes, which should be broken up into distict models,
but still maybe a good approach.
- Near the end, also discusses how corrective actions could be triggered on
alerts.
- ME: It doesn't seem like this approach would serve well for "unknown
unknowns" or undesirable emergent behavior in a system that, although
it's undesired, still fits the trained models. Not to mention, the burden
of creating log events for a bunch of things before you even realize that
they're predictive of unexpected failures.
Building Technical and Organizational Confidence Through Automated Deployments - Mieke Deenen
- A very cool "lessons-learned" talk about getting an organization on board
with a DevOps cultural shift
- About the social security collection and benefits site for the
Netherlands, werk.nl
- Started relatively small with one success, then with gained confidence
moved onto customer service division
ME: Not explicitly mentioned, but custsvc seems like an excellent place
to start from a value-stream perspective considering this is a gov't org
- Automated deployments were used for flexibility rather than just speed
ME: This is something I've often thought about: not just "we can deploy
faster", but now that we don't have to wait long for deployments "what
freedom does a quick deployment get us to experiment with things"
https://www.youtube.com/watch?v=Ulp91L2zXPE
How We Went From 40 Days to 3 Building Crystal Clear Test Cases While Improving Test Coverage! - Stephen Tyler
- A model-based testing workflow / toolchain talk
- Characterized as experimental
- Primary concern is reducing defects that 'escape' to prod
- A couple of anecdotes and a few case studies, but no comprehensive data
showing reduction in defect escape across a variety of programs
Lessons in Leading a Fortune 100 Team to a DevOps Philosophy - Uldis Karlovs-Karlovskis
- From "Nordis DevOps Lead" at Accenture. Accenture Latvia.
- About managerial / communication structures
- Mostly, didn't seem very reproducible, but suggested take-aways:
1. Assume people are trying to do the right thing
2. Seek intrinsic rewards rather than extrinsic rewards (e.g., cash rewards)
3. Engagement is an employee responsibility
4. Let people lead (in their own way)
- ME: Not that interesting for a software engineer...
https://www.youtube.com/watch?v=SRXohzWQkp0
Escrow: How To Share Secrets - Kyle Rickman
- Underarmor: Connected Fitness
- Rickman is an software development infrastrucure engineer for internal
teams
- Backstory for talk is Underarmor acquired 3 different companies with
different products and toolchains who needed to share key-value data
between the groups of developers from those companies.
- Wants to permit sharing and collaboration without revealing data between
the groups that shouldn't be shared and without requiring interaction
with the infrastructure engineer in order to share the data.
- Developed a tool called Escrow for that purpose
- Escrow has hierarchical key-value lists called "Chains" and each level in
the hierarchy is called a link
- Emphasizes "API-first" design, or having web API that devs can use to
build integrations
- A group or a user owns a link and can make the link Private or Public
- "Escrow" addresses integrity through "Rendering" chains, which holds the
values in the chain constant at whatever point in time the rendering is
made. The rendering is called an "Artifact"
- ME: The Artifact concept, doesn't fully address integrity since a group
can, apparently, write arbitrary keys and values, potentially overwriting
higher-integrity values farther up in the chain. Perhaps the Chain
construction process is expected to identify such cases, but it seems
like there's a real limitation there.
A DevOps State of Mind: Continuous Security with DevSecOps + Containers - Chris Van Tuin
- Last name pronounced "Van-Tie"
- RedHat strategist
- Disuptors
Empowered organization
Rapid Innovation
Data-Driven Intelligience
Culture of Experimentation (enabled by IT automation)
- Suggests that containers + cloud enable easier dev/sec integration
- Describes a pipeline for building out to containers and moving to
production through promotion steps
- For security fixes, the model is to fix in the container and deploy the
container...obviously doesn't address security issues in the container
runtime (e.g., docker) itself or in the container orchestration platform,
but it's a more reliable approach from having humans patching servers or
even of having a script running against live containers. It's actually
the same idea as the Clever Cloud CEO Quentin Adam's 'immutable
infrastructure'.
- ME: The mention of a company-wide docker registry really appeals to me.
We currently have a docker registry just for our program that houses
public images as well as our program-specific images. There are security
issues that come up (as suggested in the talk) when vulns are discovered
which are still sitting in publicly accesible containers. We should be
able to make a registry within the co. for, at least, public images.
- ME: Tesla manufacturing example...probably not the best example
considering their current 'production hell'
https://www.youtube.com/watch?v=ApVI7-g_wpk
https://www.youtube.com/watch?v=OaojdXYSkpI
Secrets of a High Performance Security Focussed Agile Team - Kim Carter
- Sensible security model
- Bruce Snier. 5 steps
- Talks about his book chapters which is a confusing way to start...and
somewhat annoying.
- Talks about 'code monkey' vs 'professional dev'
- Describes ways to introduce security-centric activities into a sprint
ME: The presenter sounded ill. Coughs, sniffs and swallows were distracting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment