Flume puts just raw text on Kafka, whereas Logstash by default puts encoded message.
68.68.99.199 - - [06/Apr/2014:03:35:25 +0000] "GET /2013/04/smartview-as-the-replacement-for-bi-office-with-obiee-11-1-1-7/ HTTP/1.1" 200 12391 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
vs
{"message":"68.68.99.199 - - [06/Apr/2014:03:35:25 +0000] \"GET /2013/04/smartview-as-the-replacement-for-bi-office-with-obiee-11-1-1-7/ HTTP/1.1\" 200 12391 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\"","@version":"1","@timestamp":"2015-10-21T21:38:10.165Z","host":"bigdatalite.localdomain","path":"/home/oracle/website_logs/access_log.small"}
This means that Flume -> Kafka -> Logstash with default configs fails at the Logstash stage:
68.68.99.199 - - [06/Apr/2014:03:35:25 +0000] "GET /2013/04/smartview-as-the-replacement-for-bi-office-with-obiee-11-1-1-7/ HTTP/1.1" 200 12391 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
{:exception=>#<NoMethodError: undefined method `[]' for 68.68:Float>, :backtrace=>["/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/event.rb:73:in `initialize'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-codec-json-1.0.1/lib/logstash/codecs/json.rb:46:in `decode'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-1.0.0/lib/logstash/inputs/kafka.rb:169:in `queue_event'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-1.0.0/lib/logstash/inputs/kafka.rb:139:in `run'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/pipeline.rb:177:in `inputworker'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/pipeline.rb:171:in `start_input'"], :level=>:error}
Logstash Kafka input codec defaults to json, which the input in this case isn't. Need to use codec => "plain"
input {
kafka {
zk_connect => 'bigdatalite:2181'
topic_id => 'apache_logs'
codec => plain
}
}
Additional wrinkle: if you use KafkaSink the content is ?UTF-8 and Logstash is happy. If you use the KafkaChannel then logstash rejects it
Received an event that has a different character encoding than you configured. {:text=>"\\u0000\\x90\\u000368.68.99.199 - - [06/Apr/2014:03:35:25 +0000] \\\"GET /2013/04/smartview-as-the-replacement-for-bi-office-with-obiee-11-1-1-7/ HTTP/1.1\\\" 200 12391 \\\"-\\\" \\\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\\\"", :expected_charset=>"UTF-8", :level=>:warn}
Solution (h/t) is to add the charset to the codec, giving:
input {
kafka {
zk_connect => 'bigdatalite:2181'
topic_id => 'apache_logs'
codec => plain {
charset => "ISO-8859-1"
}
}
}