public
Created

experimental checksumming for logstash events as a filter

  • Download Gist
checksum.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
require "logstash/filters/base"
require "logstash/namespace"
require "yaml"
 
class LogStash::Filters::Checksum < LogStash::Filters::Base
 
config_name "checksum"
plugin_status "experimental"
 
# A list of keys to use in creating the string to checksum
# Keys will be sorted before building the string
# keys and values will then be concatenated with pipe delimeters
# and checksummed
config :keys, :validate => :array, :default => ["@message", "@source_host", "@timestamp", "@source_path", "@type", "@source"]
 
config :algorithm, :validate => ["md5", "sha128", "sha256", "sha384"], :default => "sha256"
 
public
def register
require 'openssl'
@to_checksum = ""
end
 
public
def filter(event)
return unless filter?(event)
 
@logger.debug("Running checksum filter", :event => event)
 
@keys.sort.each do |k|
@logger.debug("Adding key to string", :current_key => k)
@to_checksum << "|#{k}|#{event[k]}"
end
@to_checksum << "|"
@logger.debug("Final string built", :to_checksum => @to_checksum)
 
digested_string = OpenSSL::Digest.hexdigest(@algorithm, @to_checksum)
@logger.debug("Digested string", :digested_string => digested_string)
event.fields['logstash_checksum'] = digested_string
end
end
cli.sh
Shell
1 2 3 4 5 6 7
jvstratusmbp :: ~/development/logstash ‹master*› » ruby --1.9 bin/logstash -f logstash.conf
{:args=>["agent", "-f", "logstash.conf"]}
{:run=>"agent"}
{:remaining=>[]}
doneargs
Using experimental plugin 'checksum'. This plugin is untested and may change in the future. For more information about plugin statuses, see http://logstash.net/docs/1.1.0.1/plugin-status {"level":"warn"}
foobarbangbaz
event.json
JSON
1 2 3 4 5 6 7 8 9 10 11 12
{
"@source":"stdin://jvstratusmbp.local/",
"@type":"stdin",
"@tags":[],
"@fields":{
"logstash_checksum":"34092978fb4055baa980815768bddc5342d601b59b48cd147ee44447ceff6929"
},
"@timestamp":"2012-06-02T07:00:48.045000Z",
"@source_host":"jvstratusmbp.local",
"@source_path":"/",
"@message":"foobarbangbaz"
}
logstash.conf
1 2 3
input { stdin { type => 'stdin' } }
filter { checksum { } }
output { stdout { debug => true debug_format => "json" }}
  • why are we sorting keys? (ie; @keys.sort.each ...) Shouldn't the order be up to the user?
  • @to_checksum is appended to every single event (oops?)
  • instead of providing 'keys' why not have the user provide a sprintf string? (ie: filter { checksum { string => "%{@message} ..." } })
  • the field name should be configurable (with a default perhaps of 'logstash_checksum'
  • The name 'checksum' is a bit misleading I think given the original intent (the term 'checksum' may imply that it is used later for integrity checks)
  • if we keep the 'keys' idea, it should probably be named 'fields' maybe? (We need better unified terminology in logstash, for reals)
  • should we permit selection of the hash serialization other than hexadecimal?

In general, I like this. In the future, we will do:

filter { checksum { ... } }
output { elasticsearch { document_id => "%{logstash_checksum}" } }

And 'dedup' will be fulfilled! ALSO THIS WILL PERMIT REINDEXING OF LOGS THAT FAILED PARSING WOO
(assuming the name 'checksum' stays, which it may not!)

this is awesome. I think we should include an option to specify the field name we put the checksum in (maybe defaul to @checksum?).

  • sorting keys makes sense
  • I agree on making keys a list of fields.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.