Zach Musgrave zachm

## keybase.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                zachm
                / keybase.md
            
            
              Created
              April 17, 2020 20:17
            
          
    Keybase proof

I hereby claim:

I am zachm on github.
I am zachm (https://keybase.io/zachm) on keybase.
I have a public key ASCTDpJFS9Ckm0hs45ZAKXbXSkMGbruCSpkeXeaQ-oKkyQo

To claim this, I am signing this object:

  
## monitorama_2018_day3.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                zachm
                / monitorama_2018_day3.md
            
            
              Last active
              June 6, 2018 23:53
            
              
                Monitorama 2018 day 3
              
          
    Achieving Google-levels of Observability into your Application with OpenCensus

Morgan McLean - Google
This talk is all about opencensus.io. Morgan is the PM for it. He's arguing that the usual N pillars of observability
is not sufficient to... have observability.
Instead he's saying context/topology+status+root cause analysis == observability.
Opencensus does:

distributed traces


## monitorama_2018_day2.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                zachm
                / monitorama_2018_day2.md
            
            
              Last active
              June 5, 2018 23:44
            
              
                Monitorama 2018 Day 2
              
          
    Want to solve Over-Monitoring and Alert Fatigue? Create the right incentives!

Kishore Jalleda - (fmr.) Yahoo and Zynga
Patients in hospitals have heart monitors, and they overalert by a ton. So occasionally patients actually die because of
a missed/ignored alarm.
Zynga: 100k alerts/month across 25+ studios. 50+ SREs in 3 locations.
How to fix this fatigue and anxiety?

Can't add more people - doesn't scale.


## monitorama_2018_day1.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                zachm
                / monitorama_2018_day1.md
            
            
              Last active
              June 5, 2018 16:35
            
              
                Monitorama 2018 Day 1
              
          
    Optimizing for Learning

Logan McDonald - BuzzFeed
She's talking about the rampup period as a new DevOps person. Her background is in cognitive science,
so she's used that push forward her own learning.
"Problem solving is easier with constraints." - Yes!
Google SRE Handbook - "Dickerson Hierarchy of Site Reliability"
Base of this pyramid? Monitoring!

  
## Splunk_.conf_17_day3.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                zachm
                / Splunk_.conf_17_day3.md
            
            
              Last active
              September 28, 2017 17:48
            
              
                Notes from day 3 of .conf
              
          
    Splunk IT Service Intelligence (ITSI): Event management is dead - event analytics is revolutionizing IT

David Mills - Staff Architect, IT Operations Analytics
Basically we're not just looking at events. We're instead looking to tie events together with some ML,
with some dashboards, and this ITSI tooling. They're using New Relic events as an example, but the workflow looks like you could
just pump PagerDuty events into Splunk for a similar effect. (n.b. why are we not doing this?)
A little bit of discussion on defining good Opsy KPIs but nothing that doesn't follow. They wrap in Businessy KPIs,
They're doing logical actions, like opening tickets, paging people downstream, etc. I'm not sure we'd want to move straight to

  
## Splunk_.conf_17_day2.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                zachm
                / Splunk_.conf_17_day2.md
            
            
              Last active
              September 27, 2017 20:02
            
              
                Notes from day 2 of Splunk .conf
              
          
    Splunk Data Lifecycle: Determining When and Where to Roll Your Data

Jeff Champagne, Principle Architect, Splunk
Events fall into buckets, 1+ buckets make up an index, indexes live on indexers.

As buckets grow, they roll hot->warm->cold->{frozen|delete}
Hot buckets live in $HOME path
Data roll: Can roll out to HDFS

Hot: At least 1 hot bucket per index, per indexer. More created for each parallel ingestion pipeline, or when quarantine is needed.
Quarantine: Happens when you load in data from ages ago (too old). Also when timestamps are broken.

  
## Splunk_.conf_17_day1.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                zachm
                / Splunk_.conf_17_day1.md
            
            
              Last active
              June 21, 2018 16:29
            
          
    Detect Numeric Outliers – Advances

Iman Makaremi - Senior Data Scientist, Splunk
Matthew Modestino - ITOA Practitioner, Splunk
So they want to move away from static alarming/decision making. Can the data itself tell you what's normal?
Basically, looking for outliers with ML (and the MLTK). One of them is Ops, the other did the math.
"We know what's normal - we collect it every day." You already have the baseline. But how do you write SPL to detect deviation?
(Hoping this next bit is relevant to sourcetype volume tracking and to larger anomaly detection work at Yelp.)

  
## devopsdays_DTW_day_two.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                zachm
                / devopsdays_DTW_day_two.md
            
            
              Last active
              October 13, 2016 18:44
            
              
                DevOpsDays DTW
              
          
    Enter The Trough Of Disillusionment

Jim Drewes @ Daugherty Business Solutions
It was okay - I didn't take many notes though.
Gartner's hype curve strongly implies that, with all the shiny new devops tools, the "Trough of Disillusionment" is soon to follow.
Basically, Jim believes that due to quickly-approaching enterprise adoption, devops is about to be come "a hell of a lot less fun". This is probably the case, but I don't know that I agree completely - there's always going to be younger technologies and companies on the vanguard of the technical bits of the movement.

  
## devopsdays_DTW_day_one.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                zachm
                / devopsdays_DTW_day_one.md
            
            
              Last active
              October 12, 2016 18:52
            
              
                DevOpsDays DTW: Day One notes
              
          
    Containers Will Not Fix Your Broken Culture (and Other Hard Truths)

Bridget Kromhout @ Pivotal
Great overview of a lot of standard devops practices, some of the sorrows that can result, and so on. Bridget gives a lot of talks - she's in an evangelist role at Pivotal.
She emphasized a lot of communication issues within orgs. Some recruiter cold emailed her and used as a selling point that their company had two OpenStack deploys. Why is that a good thing?!
"Good to be explicit and not assume defaults." - A great lesson for everyone's documentation ever!
A longer version she gave at CONFENGINE: https://www.youtube.com/watch?v=UjhIA6QTy5k

  
## automacon_notes_day_two.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                zachm
                / automacon_notes_day_two.md
            
            
              Last active
              November 2, 2017 15:21
            
              
                Automacon
              
          
    Automating Kubernetes Cluster Ops at Digital Ocean

Dan Norris @ Digitalocean
They built DO using DO components, but because they obviously have a decent amount of infrastructure they use Terraform to manage it. Droplets module, then hook it into Chef - combine launch and provision steps.
Vault as a CA for Kubernetes - they have a blog post out on this. http://do.co/vault
Some examples are given of Terraform commands; they don't appear to have much sanity checking around their workflow (e.g. terraform apply vs make plan/apply). This might be simplified for the talk - for their sake I hope it is.
terraform taint - using it to mark resources as requiring replacement.