Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
experimental checksumming for logstash events as a filter
require "logstash/filters/base"
require "logstash/namespace"
require "yaml"
class LogStash::Filters::Checksum < LogStash::Filters::Base
config_name "checksum"
plugin_status "experimental"
# A list of keys to use in creating the string to checksum
# Keys will be sorted before building the string
# keys and values will then be concatenated with pipe delimeters
# and checksummed
config :keys, :validate => :array, :default => ["@message", "@source_host", "@timestamp", "@source_path", "@type", "@source"]
config :algorithm, :validate => ["md5", "sha128", "sha256", "sha384"], :default => "sha256"
public
def register
require 'openssl'
@to_checksum = ""
end
public
def filter(event)
return unless filter?(event)
@logger.debug("Running checksum filter", :event => event)
@keys.sort.each do |k|
@logger.debug("Adding key to string", :current_key => k)
@to_checksum << "|#{k}|#{event[k]}"
end
@to_checksum << "|"
@logger.debug("Final string built", :to_checksum => @to_checksum)
digested_string = OpenSSL::Digest.hexdigest(@algorithm, @to_checksum)
@logger.debug("Digested string", :digested_string => digested_string)
event.fields['logstash_checksum'] = digested_string
end
end
jvstratusmbp :: ~/development/logstash ‹master*› » ruby --1.9 bin/logstash -f logstash.conf
{:args=>["agent", "-f", "logstash.conf"]}
{:run=>"agent"}
{:remaining=>[]}
doneargs
Using experimental plugin 'checksum'. This plugin is untested and may change in the future. For more information about plugin statuses, see http://logstash.net/docs/1.1.0.1/plugin-status {"level":"warn"}
foobarbangbaz
{
"@source":"stdin://jvstratusmbp.local/",
"@type":"stdin",
"@tags":[],
"@fields":{
"logstash_checksum":"34092978fb4055baa980815768bddc5342d601b59b48cd147ee44447ceff6929"
},
"@timestamp":"2012-06-02T07:00:48.045000Z",
"@source_host":"jvstratusmbp.local",
"@source_path":"/",
"@message":"foobarbangbaz"
}
input { stdin { type => 'stdin' } }
filter { checksum { } }
output { stdout { debug => true debug_format => "json" }}
  • why are we sorting keys? (ie; @keys.sort.each ...) Shouldn't the order be up to the user?
  • @to_checksum is appended to every single event (oops?)
  • instead of providing 'keys' why not have the user provide a sprintf string? (ie: filter { checksum { string => "%{@message} ..." } })
  • the field name should be configurable (with a default perhaps of 'logstash_checksum'
  • The name 'checksum' is a bit misleading I think given the original intent (the term 'checksum' may imply that it is used later for integrity checks)
  • if we keep the 'keys' idea, it should probably be named 'fields' maybe? (We need better unified terminology in logstash, for reals)
  • should we permit selection of the hash serialization other than hexadecimal?

In general, I like this. In the future, we will do:

filter { checksum { ... } }
output { elasticsearch { document_id => "%{logstash_checksum}" } }

And 'dedup' will be fulfilled! ALSO THIS WILL PERMIT REINDEXING OF LOGS THAT FAILED PARSING WOO
(assuming the name 'checksum' stays, which it may not!)

fetep commented Jun 2, 2012

this is awesome. I think we should include an option to specify the field name we put the checksum in (maybe defaul to @Checksum?).

  • sorting keys makes sense
  • I agree on making keys a list of fields.

Nice work (like everything else in Logstash). Thanks!

If you need to calculate hash for a single field, you could use the following ruby code:

    ruby {
      code => "event.to_hash.merge!('message_hash' => OpenSSL::Digest.hexdigest('md5', event['message']))"
    }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment