epheph/pdxdevops-2015-06-29.md Secret

## pdxdevops-2015-06-29.md

      
    Raw
  

              pdxdevops-2015-06-29.md
            
          
Congratulations jesusaurus for winning Hashiconf ticket!

Hiring:

Treehouse looking for part-time teachers
Treehouse hiring ops person, RoR shop, chef
HP is Hiring, Spencer's team


Heka in Production
Nathan Williams, Systems Developer @ TreeHouse
If one replaces LogStash with Heka, a general data-stream processing engine written in Go, you replace your "ELK" Stack with a "HEK" stack:

LogStash -> Heka
Kibana
ElasticSearch

Heka is comparable to LogStash, a system that moves text in and out of various systems, commonly ending up in ElasticSearch. At TreeHouse, LogStash worked great at first, providing new graphs and business insights, but became a major drag on resources, with scalability issues of its own.
Problems:

Agent installation problems
High resource utilization
Buffer backups
Random hangs/slowdown
Data discarded on error inserting into ElasticSearch

Input -> Decoders -> Filters -> Encoders -> Output
Heka has better syntax than LogStash, toml, like nested .ini
Can simplify architecture, write directly to elasticsearch, effeciently performs parsing allowing it to be distributed across agents
When config syntax and architecture is more approachable, developers are more engaged with log processing
On writing plugins:
High learning curve
Plugins are Lua, which isn't as familiar as other high-level scripting languages
You can write in Go if you are willing to recompile
Small, but helpful, ecosystem
Planning on adding Hipchat, Slack, Sensu, PagerDuty output plugin
Q&A: Back-pressure: will do what it can to stop ingesting data if outputs are not keeping up.
Q&A: Monitoring: Not as much problem with processing falling behind, monitoring all the individual endpoints has not come up yet.

Heka and Riemann
Nick Chappell, SRE @ New Relic
Stream Processing, an assembly line instead of a workbench.
Examples:
Apache Storm.

Spouts = Inputs
Bolts = Outputs
Messages = Tuples
Topologies are combinations of spouts and bolts, streams are a flow of tuples

Apache Samza - A more opinionated Storm
Twitter Heron - Non-open-source Twitter version
Apache Mesos/Mesosphere - General purpose framework you could use
Apache Kafka - Distributed pub/sub message queue.

Producers write to Kafka Topics (which are split into partitions)
Consumers read from Kafka Topics

Jepsen "Call Me Maybe" Kafka Article
Common pattern with Storm uses Kafka as the connection points between Spouts and Bolts
"If your producres/consumers are stateless, your topologies can be horizontally scalable"
Stream processing applies to monitirong as metrics, events, and logs are generated continuously
Dave Josephsen @ Monitorama - Stream Processing inside Librato
Heka features

Kafka Input (consumer)
Output (consumer)
Process Output (think NRPE)

Heka Drawbacks

Changing "Type" is difficult
Lua plugins override Hostname and Type fields to system that ran the plugin, not the origin
Topology of data flow is not obvious
Bugs in prior versions caused disk queues (and system partitions) to fill up
Lua processing is not as performant due to required memcopy's

Rob Miller @ Monitorama - Heka Workshop
Chef cookbook
puppet module
Riemann Stream Processing

Event is just a clojure map, configuration is an actual Clojure program
Plugins are Java/Clojure libraries
A Riemann config creates a single-node storm-style topology

Kyle Kingsbury's @ Monitorama - Working with Riemann
Andrew Clegg @ Monitorama - Kale 2.0

Stream processing, Heka and Riemann
Jonathan Owens, SRE @ New Relic
Extract high-level event data from logs with Heka
You need a tool to assist with semi-professional systems that actually report great information into logs, but do not report them into a  monitoring system that could provide more insights.
Use Lua LPEG to break apart opaque log lines into a meaningful data structure (Similar in function to LogStash's Grok, but feels more like a scripting language)
LPeg Grammar Tester verifies data structure extractions
Implemented as 'process_message" lua function, Very unlike a regular expression, and runs significantly faster.
Feeds into ElasticSearch, numerical ingestion/graphing of data
Also ship into New Relic Insights Service, use SQL-like syntax to extract messages and graph
Q: "What language would you prefer if you could choose any"
A: Not constrained to use LPeg, just a recommended solution. Would rather use RegEx, but speed is a major concern
LogStash users commonly ask for Grok, but that is a syntax only available in LogStash.
Riemann, as a more general tool, uses more general Java libraries for parsing.
LPeg is more log-specific language. Has killer nginx log-decoding feature; can be confirued with the same nginx log_format config.
LPeg is missing a full-featured grok debugger