Skip to content

Instantly share code, notes, and snippets.

@lusis
Created June 2, 2012 07:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lusis/2857059 to your computer and use it in GitHub Desktop.
Save lusis/2857059 to your computer and use it in GitHub Desktop.
experimental checksumming for logstash events as a filter
require "logstash/filters/base"
require "logstash/namespace"
require "yaml"
class LogStash::Filters::Checksum < LogStash::Filters::Base
config_name "checksum"
plugin_status "experimental"
# A list of keys to use in creating the string to checksum
# Keys will be sorted before building the string
# keys and values will then be concatenated with pipe delimeters
# and checksummed
config :keys, :validate => :array, :default => ["@message", "@source_host", "@timestamp", "@source_path", "@type", "@source"]
config :algorithm, :validate => ["md5", "sha128", "sha256", "sha384"], :default => "sha256"
public
def register
require 'openssl'
@to_checksum = ""
end
public
def filter(event)
return unless filter?(event)
@logger.debug("Running checksum filter", :event => event)
@keys.sort.each do |k|
@logger.debug("Adding key to string", :current_key => k)
@to_checksum << "|#{k}|#{event[k]}"
end
@to_checksum << "|"
@logger.debug("Final string built", :to_checksum => @to_checksum)
digested_string = OpenSSL::Digest.hexdigest(@algorithm, @to_checksum)
@logger.debug("Digested string", :digested_string => digested_string)
event.fields['logstash_checksum'] = digested_string
end
end
jvstratusmbp :: ~/development/logstash ‹master*› » ruby --1.9 bin/logstash -f logstash.conf
{:args=>["agent", "-f", "logstash.conf"]}
{:run=>"agent"}
{:remaining=>[]}
doneargs
Using experimental plugin 'checksum'. This plugin is untested and may change in the future. For more information about plugin statuses, see http://logstash.net/docs/1.1.0.1/plugin-status {"level":"warn"}
foobarbangbaz
{
"@source":"stdin://jvstratusmbp.local/",
"@type":"stdin",
"@tags":[],
"@fields":{
"logstash_checksum":"34092978fb4055baa980815768bddc5342d601b59b48cd147ee44447ceff6929"
},
"@timestamp":"2012-06-02T07:00:48.045000Z",
"@source_host":"jvstratusmbp.local",
"@source_path":"/",
"@message":"foobarbangbaz"
}
input { stdin { type => 'stdin' } }
filter { checksum { } }
output { stdout { debug => true debug_format => "json" }}
@jordansissel
Copy link

  • why are we sorting keys? (ie; @keys.sort.each ...) Shouldn't the order be up to the user?
  • @to_checksum is appended to every single event (oops?)
  • instead of providing 'keys' why not have the user provide a sprintf string? (ie: filter { checksum { string => "%{@message} ..." } })
  • the field name should be configurable (with a default perhaps of 'logstash_checksum'
  • The name 'checksum' is a bit misleading I think given the original intent (the term 'checksum' may imply that it is used later for integrity checks)
  • if we keep the 'keys' idea, it should probably be named 'fields' maybe? (We need better unified terminology in logstash, for reals)
  • should we permit selection of the hash serialization other than hexadecimal?

@jordansissel
Copy link

In general, I like this. In the future, we will do:

filter { checksum { ... } }
output { elasticsearch { document_id => "%{logstash_checksum}" } }

And 'dedup' will be fulfilled! ALSO THIS WILL PERMIT REINDEXING OF LOGS THAT FAILED PARSING WOO
(assuming the name 'checksum' stays, which it may not!)

@fetep
Copy link

fetep commented Jun 2, 2012

this is awesome. I think we should include an option to specify the field name we put the checksum in (maybe defaul to @Checksum?).

@jordansissel
Copy link

  • sorting keys makes sense
  • I agree on making keys a list of fields.

@daniilyar-confyrm
Copy link

Nice work (like everything else in Logstash). Thanks!

If you need to calculate hash for a single field, you could use the following ruby code:

    ruby {
      code => "event.to_hash.merge!('message_hash' => OpenSSL::Digest.hexdigest('md5', event['message']))"
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment