Skip to content

Instantly share code, notes, and snippets.

@epheph
Last active August 29, 2015 14:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save epheph/751ea7604f4f8921ba40 to your computer and use it in GitHub Desktop.
Save epheph/751ea7604f4f8921ba40 to your computer and use it in GitHub Desktop.
pdxdevops-2015-06-29
  • Congratulations jesusaurus for winning Hashiconf ticket!

Hiring:

  • Treehouse looking for part-time teachers
  • Treehouse hiring ops person, RoR shop, chef
  • HP is Hiring, Spencer's team

Heka in Production Nathan Williams, Systems Developer @ TreeHouse

If one replaces LogStash with Heka, a general data-stream processing engine written in Go, you replace your "ELK" Stack with a "HEK" stack:

  • LogStash -> Heka
  • Kibana
  • ElasticSearch

Heka is comparable to LogStash, a system that moves text in and out of various systems, commonly ending up in ElasticSearch. At TreeHouse, LogStash worked great at first, providing new graphs and business insights, but became a major drag on resources, with scalability issues of its own. Problems:

  • Agent installation problems
  • High resource utilization
  • Buffer backups
  • Random hangs/slowdown
  • Data discarded on error inserting into ElasticSearch

Input -> Decoders -> Filters -> Encoders -> Output Heka has better syntax than LogStash, toml, like nested .ini Can simplify architecture, write directly to elasticsearch, effeciently performs parsing allowing it to be distributed across agents

When config syntax and architecture is more approachable, developers are more engaged with log processing

On writing plugins: High learning curve Plugins are Lua, which isn't as familiar as other high-level scripting languages You can write in Go if you are willing to recompile Small, but helpful, ecosystem

Planning on adding Hipchat, Slack, Sensu, PagerDuty output plugin

Q&A: Back-pressure: will do what it can to stop ingesting data if outputs are not keeping up. Q&A: Monitoring: Not as much problem with processing falling behind, monitoring all the individual endpoints has not come up yet.


Heka and Riemann Nick Chappell, SRE @ New Relic

Stream Processing, an assembly line instead of a workbench.

Examples:

Apache Storm.

  • Spouts = Inputs
  • Bolts = Outputs
  • Messages = Tuples
  • Topologies are combinations of spouts and bolts, streams are a flow of tuples

Apache Samza - A more opinionated Storm

Twitter Heron - Non-open-source Twitter version

Apache Mesos/Mesosphere - General purpose framework you could use

Apache Kafka - Distributed pub/sub message queue.

  • Producers write to Kafka Topics (which are split into partitions)
  • Consumers read from Kafka Topics

Jepsen "Call Me Maybe" Kafka Article

Common pattern with Storm uses Kafka as the connection points between Spouts and Bolts

"If your producres/consumers are stateless, your topologies can be horizontally scalable"

Stream processing applies to monitirong as metrics, events, and logs are generated continuously

Dave Josephsen @ Monitorama - Stream Processing inside Librato

Heka features

  • Kafka Input (consumer)
  • Output (consumer)
  • Process Output (think NRPE)

Heka Drawbacks

  • Changing "Type" is difficult
  • Lua plugins override Hostname and Type fields to system that ran the plugin, not the origin
  • Topology of data flow is not obvious
  • Bugs in prior versions caused disk queues (and system partitions) to fill up
  • Lua processing is not as performant due to required memcopy's

Rob Miller @ Monitorama - Heka Workshop

Chef cookbook puppet module

Riemann Stream Processing

  • Event is just a clojure map, configuration is an actual Clojure program
  • Plugins are Java/Clojure libraries
  • A Riemann config creates a single-node storm-style topology

Kyle Kingsbury's @ Monitorama - Working with Riemann

Andrew Clegg @ Monitorama - Kale 2.0


Stream processing, Heka and Riemann Jonathan Owens, SRE @ New Relic

Extract high-level event data from logs with Heka You need a tool to assist with semi-professional systems that actually report great information into logs, but do not report them into a monitoring system that could provide more insights. Use Lua LPEG to break apart opaque log lines into a meaningful data structure (Similar in function to LogStash's Grok, but feels more like a scripting language) LPeg Grammar Tester verifies data structure extractions Implemented as 'process_message" lua function, Very unlike a regular expression, and runs significantly faster. Feeds into ElasticSearch, numerical ingestion/graphing of data Also ship into New Relic Insights Service, use SQL-like syntax to extract messages and graph

Q: "What language would you prefer if you could choose any" A: Not constrained to use LPeg, just a recommended solution. Would rather use RegEx, but speed is a major concern LogStash users commonly ask for Grok, but that is a syntax only available in LogStash.

Riemann, as a more general tool, uses more general Java libraries for parsing.

LPeg is more log-specific language. Has killer nginx log-decoding feature; can be confirued with the same nginx log_format config.

LPeg is missing a full-featured grok debugger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment