jordansissel/RESULTS.md

## graphitefunction.py
# This requires 'pyes'
# The function goes in webapp/graphite/render/functions.py
# Don't forget to update the SeriesFunctions dict to include it.

import pyes
def logstashHits(requestContext, query):
  conn = pyes.ES("semicomplete.com:9200")

  start = requestContext["startTime"].isoformat()
  end = requestContext["endTime"].isoformat()

  # hardcode the utc offset because this vagrant box thinkts it is in CEST.
  boundedquery = "@timestamp:[%s-0500 TO %s-0500] AND %s" % (start, end, query)
  q = pyes.StringQuery(boundedquery).search()
  q.facet.facets.append(pyes.facets.DateHistogramFacet('date_facet',
    field='@timestamp',
    interval='second'))
  results = conn.search(query=q)

  logfile = open("/tmp/x", "a")
  logfile.write("q: %s\n" % boundedquery)
  logfile.write("entries: %r\n" % results.facets.date_facet.entries)
  logfile.write("%r\n" % requestContext["startTime"])
  logfile.write("%s\n" % requestContext["startTime"])
  logfile.close()
  values = []
  for facet in results.facets.date_facet.entries:
    values.append(facet['count'])
  else:
    # avoid bad division?
    values.append(0)

  return [TimeSeries(query,
            time.mktime(requestContext["startTime"].timetuple()),
            time.mktime(requestContext["endTime"].timetuple()),
            1, values)]

## logstash.conf
input {
  twitter {
    type => "twitter"
    user => "USER"
    password => "PASS"
    keywords => [ "iphone", "samsung", "cloud" ]
  }
}

output {
  elasticsearch { embedded => true }
}

## RESULTS.md

      
    Raw
  

              RESULTS.md
            
          
    logstash queries graphed with graphite.

Operation: Decouple whisper from graphite.
Method: Create a graphite function that does a date histogram facet query against elasticsearch for a given query string for the time period viewed in the current graph.
Reason: graphite has some awesome math functions. Wouldn't it be cool if we could use those on logstash results?
The screenshot below is using logstash to watch the twitter stream of keywords "iphone" "apple" and "samsung" - then I graph them each, so we get an idea of popularity. As a bonus, I also do a movingAverage() on the iphone curve to show you why this is awesome.
Just to be totally clear, this implementation does not use whisper or rrd at all. The 'logstashHits()' function simply queries elasticsearch directly and produces a proper TimeSeries that graphite can use to graph! THIS IS AMAZING.
Bonus points for graphite functions being super easy to write. I used the 'sinFunction()' as a starting point since it generates its own time series.
Result:
	# This requires 'pyes'
	# The function goes in webapp/graphite/render/functions.py
	# Don't forget to update the SeriesFunctions dict to include it.

	import pyes
	def logstashHits(requestContext, query):
	conn = pyes.ES("semicomplete.com:9200")

	start = requestContext["startTime"].isoformat()
	end = requestContext["endTime"].isoformat()

	# hardcode the utc offset because this vagrant box thinkts it is in CEST.
	boundedquery = "@timestamp:[%s-0500 TO %s-0500] AND %s" % (start, end, query)
	q = pyes.StringQuery(boundedquery).search()
	q.facet.facets.append(pyes.facets.DateHistogramFacet('date_facet',
	field='@timestamp',
	interval='second'))
	results = conn.search(query=q)

	logfile = open("/tmp/x", "a")
	logfile.write("q: %s\n" % boundedquery)
	logfile.write("entries: %r\n" % results.facets.date_facet.entries)
	logfile.write("%r\n" % requestContext["startTime"])
	logfile.write("%s\n" % requestContext["startTime"])
	logfile.close()
	values = []
	for facet in results.facets.date_facet.entries:
	values.append(facet['count'])
	else:
	# avoid bad division?
	values.append(0)

	return [TimeSeries(query,
	time.mktime(requestContext["startTime"].timetuple()),
	time.mktime(requestContext["endTime"].timetuple()),
	1, values)]
	input {
	twitter {
	type => "twitter"
	user => "USER"
	password => "PASS"
	keywords => [ "iphone", "samsung", "cloud" ]
	}
	}

	output {
	elasticsearch { embedded => true }
	}