Skip to content

Instantly share code, notes, and snippets.

@rmoff
Last active October 22, 2015 21:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rmoff/28b346ffbe29f729c5a6 to your computer and use it in GitHub Desktop.
Save rmoff/28b346ffbe29f729c5a6 to your computer and use it in GitHub Desktop.
Kafka / Logtash / Flume notes

Flume puts just raw text on Kafka, whereas Logstash by default puts encoded message.

68.68.99.199 - - [06/Apr/2014:03:35:25 +0000] "GET /2013/04/smartview-as-the-replacement-for-bi-office-with-obiee-11-1-1-7/ HTTP/1.1" 200 12391 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

vs

{"message":"68.68.99.199 - - [06/Apr/2014:03:35:25 +0000] \"GET /2013/04/smartview-as-the-replacement-for-bi-office-with-obiee-11-1-1-7/ HTTP/1.1\" 200 12391 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\"","@version":"1","@timestamp":"2015-10-21T21:38:10.165Z","host":"bigdatalite.localdomain","path":"/home/oracle/website_logs/access_log.small"}

This means that Flume -> Kafka -> Logstash with default configs fails at the Logstash stage:

68.68.99.199 - - [06/Apr/2014:03:35:25 +0000] "GET /2013/04/smartview-as-the-replacement-for-bi-office-with-obiee-11-1-1-7/ HTTP/1.1" 200 12391 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 
{:exception=>#<NoMethodError: undefined method `[]' for 68.68:Float>, :backtrace=>["/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/event.rb:73:in `initialize'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-codec-json-1.0.1/lib/logstash/codecs/json.rb:46:in `decode'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-1.0.0/lib/logstash/inputs/kafka.rb:169:in `queue_event'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-1.0.0/lib/logstash/inputs/kafka.rb:139:in `run'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/pipeline.rb:177:in `inputworker'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/pipeline.rb:171:in `start_input'"], :level=>:error}

Logstash Kafka input codec defaults to json, which the input in this case isn't. Need to use codec => "plain"

input {
        kafka {
                zk_connect => 'bigdatalite:2181'
                topic_id => 'apache_logs'
                codec => plain 
        }
}

Additional wrinkle: if you use KafkaSink the content is ?UTF-8 and Logstash is happy. If you use the KafkaChannel then logstash rejects it

Received an event that has a different character encoding than you configured. {:text=>"\\u0000\\x90\\u000368.68.99.199 - - [06/Apr/2014:03:35:25 +0000] \\\"GET /2013/04/smartview-as-the-replacement-for-bi-office-with-obiee-11-1-1-7/ HTTP/1.1\\\" 200 12391 \\\"-\\\" \\\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\\\"", :expected_charset=>"UTF-8", :level=>:warn}

Solution (h/t) is to add the charset to the codec, giving:

input {
        kafka {
                zk_connect => 'bigdatalite:2181'
                topic_id => 'apache_logs'
                codec => plain {
                        charset => "ISO-8859-1"
                }
        }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment