zeroasterisk/VelocityNotes2014.md

## devopsweekly-182.md

      
    Raw
  

              devopsweekly-182.md
            
          
    DEVOPS WEEKLY
ISSUE #182 - 29th June 2014
I’m writing this quickly from San Jose airport before flying back to the UK, which is why most people will be receiving this issue 8 hours or so later that usual. Lots of content from Velocity and Devopsdays Silicon Valley this week (and probably next when I get more time to find some of the excellent presentations). It’s been great catching up with lots of folks, but a big shoutout to the organisers who put on two great events.
Sponsor

Devops Weekly is sponsored by Brightbox Cloud - serious UK-based cloud infrastructure from only 1.5p per hour (£10.95/month)
Start your £20 free trial now: http://brightbox.com/devopsweekly
Velocity and Devopsdays

Definitely one of the highlights of Velocity for me, this talk aimed to cover everything you need to know to be good at operations. Ambitious, entertaining and hugely useful.
http://adamhjk.github.io/good-at-ops/#/
Probably a tie for my favourite presentation, this next deck covers what the presenter called minimal viable bureaucracy. Lots of personal stories mixed with some wider observations. Lots to learn about organisation design in here.
https://speakerdeck.com/lauraxt/minimum-viable-bureaucracy-june-2014-edition
Opsweekly is a tool from Etsy for On call alert classification and reporting. The README could do with a screenshot but it’s a very interesting idea which brings together all the data from an on-call rota into one place for both personal tracking and bigger picture planning.
https://github.com/etsy/opsweekly
Another of the talks from Velocity I found interesting, what happens to the infrastructure when a large company buys a smaller one? In this case what and how did Instagrate migrate things over to Facebook?
http://instagram-engineering.tumblr.com/post/89992572022/migrating-aws-fb
One of the Velocity ignite talks providing a quickfire argument that you really really should be caring about security in your development and operations work.
https://speakerdeck.com/barnbarn/velocity-conference-santa-clara-2014-ignite
A topic close to my heart cropped up a few times at Devopsdays Silicon Valley, that of government. This post summaries one of the open spaces and makes a few suggestions for the US federal government.
http://www.mikemcgarr.com/blog/devops-in-the-federal-government.html
The traditional Devopsdays State of the Union was presented at both Amsterdam and at Silicon Valley this week. Riffing on the recent devops survey results, composability of systems and the move to software defined everything.
http://www.slideshare.net/botchagalupe/devopsdays-state-of-the-union-amsterdam-2014
News

This is a nice post from one of the organisers of Devopsdays Brisbane, explaining to people who haven’t come across the event what it is. Doing outreach to a local community like this is a great idea. Also, they have Sidney Dekker speaking!
http://mattcallanan.blogspot.com/2014/06/what-is-devops-days-brisbane-2014.html
Anomaly detection, and other applications of machine learning to monitoring, is a hot topic at the moment. This post is a good high level introduction, focusing on some of the tools you can try out right now.
http://blog.bigpanda.io/a-practical-guide-to-anomaly-detection/
A nice reminder that devops isn’t about killing off existing positions but about specialists working together. This post brings up the spectre of devops killing off the developer as we know it, and them debunks the idea.
http://cfengine.com/company/blog-detail/devops-killing/
Lots of large companies are getting interested in Devops and this next post should be useful to anyone working in such an enterprise. It collects common objections together along with a counter argument.
http://dev2ops.org/2014/06/adopting-devops-in-enterprise-operations/
Ever considered debugging database queries by dropping down to inspecting tcp packets? This next post makes this sound no-crazy with some great examples.
https://vividcortex.com/blog/2014/06/23/discovering-query-bugs-by-tcp-inspection/
Jobs

Having won a number of key customer accounts, Bashton are recruiting at both senior and junior levels to join our team of Linux operations experts. Based in the North West of the UK, we design, build and manage infrastructure primarily on Amazon Web Services, providing ultra reliable solutions to customers in a range of sectors. We can offer the ability to work on large-scale web facing infrastructure without the internal politics of working for a large organisation.
http://www.bashton.com/jobs/
Tools

Given our daily use of version control systems they contain an awful lot of data past just the source code. This tool allows for exporting a git repository into the solr search engine for data mining.
https://github.com/arafalov/git-to-solr
I’ve mentioned OSv previous as an interesting take on the operating system, but trying it out locally had required a lot of effort. Enter capstan, which provides a very nice command line interface to launch OSv instances locally on your machine.
https://github.com/cloudius-systems/capstan
Cayley is an open source Graph database. It supports multiple storage backends, an HTTP based API as well as a REPL and a built-in query editor and visualiser.
https://github.com/google/cayley
If you received this email directly then you're already signed up, thanks! If however someone forwarded this email to you and you'd like to get it each week then you can subscribe at http://devopsweekly.com

  
## gems.md

      
    Raw
  

              gems.md
            
          
    http://adamhjk.github.io/good-at-ops/
INCIDENT COMMAND

The First Responder is the default Incident Commander

Decides what to do next
Coordinates resources
Can hand off command
Communicates status
Not about rank

There is only ONE Incident Commander.
HOW TO RUN A POST MORTEM


Invoke the space: we are here to learn, not to blame
Describe the incident
Establish the timeline
Identify contributing factors
Describe customer impact
Describe remediation tasks for the root cause
Describe improvement tasks for response process

AVAILABILITY ROUNDUP


Understand your Availability Targets
Track and understand your M*'s
Reduce time to detect and repair
Use capacity planning to avoid obvious incidents
Have an incident response and command process
Perform and publish post-mortems for every incident
Prioritize the outcomes

People, Process, Technology

http://www.amazon.com/The-Asshole-Rule-Civilized-Workplace-ebook/dp/B000OT8GV2
ASSHOLES ARE INEFFICIENT


Positive interactions must outnumber negative ones 5:1
Bad interactions have stronger, more pervasive, and longer lasting effects

WHAT YOU CAN DO


Don't be an Asshole, and fire or shun those who are
Set clear expectations for others
Praise people
Make friends with, and care about your co-workers
Listen to each other
Take pride in your work

KAIZEN

SMALL IMPROVEMENTS

Evaluate a process, make it better.
Try using the scientific method:

Ask a question
Do research
Construct a hypothesis
Test your hypothesis
Analyze data and draw a conclusion
Communicate your results

EFFICIENCY ROUNDUP


Greatest gains are in improving People
Continually improve process, be willing to redesign in the face of new challenges
Use Scalable Systems Design to improve your technology and automation


## VelocityNotes2014.md

      
    Raw
  

              VelocityNotes2014.md
            
          
    From: Matt (https://github.com/mreishus)
Date: Mon, Jun 30, 2014
Subject: Velocity Conference CA 2014 Trip Report

Summary Evaluation of Velocity 2014:

The mobile share of internet traffic is on pace to eclipse desktop traffic within 2014.   As a whole, developers are doing a poor job of optimizing for mobile and  users are frustrated.  Mobile sites are actually trending slower year over year, even with faster devices accounted for.
Even desktop performance affects business metrics (like conversion rate, bounce rate, page views, etc..).  This can usually be measured without taking the time to optimize performance; most sites are serving a mixture of fast and slow experiences to users.  Just correlate the metric vs performance while controlling for some variables (like location).
From Puppet's State of DevOps Report in 2014 - IT performance was qualified in a statistically valid way and highly correlated with these three independent metrics:  MTTR (mean time to recover), lead time for changes, and deploy frequency.  Companies with high performing IT departments were significantly more likely to meet their profitability, market share and productivity goals.
Surprising Information: According to many speakers, using the mean as statical measure of performance was worthless.  Much better to split into quantiles (quadrants of percentiles).  Also, many disparaged auto-scaling, including one speaker who called it "the biggest lie in IT".

Knowledge gained at Velocity 2014:

Mobile debugging techniques
Real User Monitoring techniques
General page performance optimization techniques
Browser animation optimization techniques and how to prevent "layout thrashing"
Concept of "autonomous actors keeping promises" to make operations safer (promise theory)
Postmortem
Capacity planning
"Money Graph" - great metric to have, often a lagging indicator of other invisible problems
Etsy's method of continuous experimentation - rolling out features to an increasing % of users and tracking success
Google's techniques for reducing latency in a service oriented architecture
How to use math to detect anomalies in non-guassian data
How to include security tests in continuous integration pipelines
Tombstone technique to find dead or unused code.

Information that may benefit my co-workers:

One of the best talks I saw was "How To Be Good At Operations in 40 minutes".  Slides are here: http://adamhjk.github.io/good-at-ops/#/ (Make sure to press down instead of right on slides 4, 5, 6, 7)
Also very interesting:  The 2014 State of DevOps Report:  http://puppetlabs.com/sites/default/files/2014-state-of-devops-report.pdf
Slides from all talks avail here http://velocityconf.com/velocity2014/public/schedule/proceedings
Ask me about any of the above
There were several talks I did not attend.. the videos will be released in 2-3 weeks.

People, Companies and Projects of Note:
Free Tools


DevOps Weekly email newsletter
github.com/secure-pipeline (security tests in CI pipeline)
Weinre - web based mobile debugger (no USB cable required)
Android/iOS native debuggers (require USB cable)
Fiddler - the proxy for web developers.  Also has a "Bandwidth Simulator" plugin.
WebPageTest - see webpage performance data on a variety of real devices loading your site (iphone, desktop, android etc...).  One company made a nodejs wrapper of WPT and put it in their CI pipeline!
SpeedCurve - GUI tool on top of WebPageTest
sitespeed.io - CLI tool on top of WebPageTest
Google's "PageSpeed Insights" and "PageSpeed Optimization" tools
ModPageSpeed - automatically implement performance optimizations at the nginx level
Appium - selenium for android/ios
skyline and oculus make up etsy's kale stack - metric measuring and anomoly detection
R - stats package
PhantomJS - used by Ebay for UI testing
zopfli - Google gzip algorithm, backwards compatible w/ broswers but ~5% byte improvement.  Jquery 18% improvement.

Paid Tools


NewRelic Insights - measure business metrics w/ GA like calls and  correlate that with performance and availibility data
Verisign - Global load balancer (if we decide to add redundancy to our rackspace datacenter)
ThousandEyes - Finds specific source of network problems between you and customer
Logentries - "make sense of logs", comphrensive log solution
Lognormal - RUM
Pagerduty - middleware between alerting systems (nagios, newrelic, etc..) and people's cell phones
Keynote - puts performance data in context, tells you how you stack up against others in your industry
Ghostfish - replays prod traffic in test environment
EdgeCast - CDN
CopperEgg - monitoring / metrics
Neustar - everything

Action items:


Implementing SPDY / HTTP 2.0 is the single biggest performance gain we can make for the least amount of effort.  Impact is high and it's easy to do.   (Also.. with SPDY we can stop spriting all together!)
Start tracking and understanding MTTR/MTTD (mean time to recover and mean time to detect)
Then start reducing MTTR/MTTD
Add at least one security test to our CI pipeline
Asset pre-fetching with link rel="prefetch" - another item with a largish impact (future potential is high, currently only supported by FF) and is easy to do.
Consider/discuss New Relic Insights
Can we include the Kale stack (skyline and oculus) with our Logstash shipped logs in ElasticSearch?  Need to research.

Talks attended:


Battle-tested Code Without The Battle - Security Testing and Continuous Integration
Debugging and Tuning Mobile Web Sites with Modern Web Browsers
RUM: Getting Beyond Page Level Metrics
Browser Performance Tools
Achieving Rapid Response Times In Large Online Services
Performance In Context - Is "Good" Good Enough
Exponential Load Testing: Multiply the Power, Multiply the Results
Lowing the Barrier to Programming
Building on a Bedrock of Failure
Responsive Web Performance In the Wild
How to Adapt and Innovate for 2018
Understanding Slowness
Upgrading the Web - Driving Support for New Standards
A Look at Looking in the Mirror: PostMortems
What Makes Mobile Websites Tick?  How Do We Make Them Faster? Insights From WebPagetest and HTTP Archive
How to be Great at Operations
Some Simple Math to get Some Signal out of Your Ops Data Noise
Virtual Machines, Javascript and Assembler
Test Driven Mobile Development with Appium, Just Like Selenium
Top 10 Lessons Learned Building PageSpeed and trying to Make The Web Fast
Web Performance, Why It Really Matters
Mobile Web Is Not (Just) a Technical Challenge
Lightning Demos
Responsive & Fast Pseudo Book Reading: A Tale of Mobile Wwaiting
Software Analytics for Performance Nerds
Building Self-Adaptive Autonomous Infrastructure with an Advanced Monitoring Architecture
A 5 Minute Checklist for Application Monitoring
Performance and Maintainability with Continuous Experimentation
DevOps Means Business
Case Study:  How Shifting to a DevOps Culture Enabled Performance and Capacity Improvements
Self-Repairing Deployment Pipelines: What We Ought to Mean by Distributed Orchestration
5 Things You Didn't Know NGINX Could Do
Human Confirmation Bias In Monitoring of Systems