Skip to content

Instantly share code, notes, and snippets.

@yaauie
Last active September 28, 2021 20:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yaauie/0bbcb0ef92c14c060c0d7175af1399fb to your computer and use it in GitHub Desktop.
Save yaauie/0bbcb0ef92c14c060c0d7175af1399fb to your computer and use it in GitHub Desktop.
Patch Logstash CSV Codec 1.0.0 to ensure the Logstash File Input re-detects column names per file (NOT NECESSARY with logstash-codec-multiline >= 3.1.1)
diff --git a/lib/logstash/codecs/csv.rb b/lib/logstash/codecs/csv.rb
index 07d6416..66cd6ed 100644
--- a/lib/logstash/codecs/csv.rb
+++ b/lib/logstash/codecs/csv.rb
@@ -133,12 +133,19 @@ class LogStash::Codecs::CSV < LogStash::Codecs::Base
rescue CSV::MalformedCSVError => e
@logger.error("CSV parse failure. Falling back to plain-text", :error => e, :data => data)
yield LogStash::Event.new("message" => data, "tags" => ["_csvparsefailure"])
end
end
+ def auto_flush
+ if caller.any? { |line| line.end_with?("`evict'") }
+ logger.debug('clearing column cache for codec reuse')
+ @columns = params['columns'] || []
+ end
+ end
+
def encode(event)
if @include_headers
csv_data = CSV.generate_line(select_keys(event), :col_sep => @separator, :quote_char => @quote_char, :headers => true)
@on_event.call(event, csv_data)
# output headers only once per codec lifecycle

First, CD into your logstash directory and ensure that version 1.0.0 of the CSV Codec is installed:

bin/logstash-plugin install logstash-codec-csv --version 1.0.0

Next, save this patch to local disk logstash-codec-csv.auto-flush-on-evict.patch, and apply it:

patch --strip=1 --directory vendor/bundle/jruby/2.5.0/gems/logstash-codec-csv-1.0.0 < logstash-codec-csv.auto-flush-on-evict.patch

You should see output indicating success:

patching file lib/logstash/codecs/csv.rb
Hunk #1 succeeded at 138 (offset 5 lines).
@yaauie
Copy link
Author

yaauie commented Sep 28, 2021

I've opened an issue on the multiline codec that should obviate the need for this patch: logstash-plugins/logstash-codec-multiline#70

@yaauie
Copy link
Author

yaauie commented Sep 28, 2021

The IdentityMapCodec used by the File and Lumberjack inputs no longer reuses codecs across identities as of v3.1.1 of the Multiline Codec (it lives there for historical reasons, as the Multiline Codec was the protogen for stateful codecs).

If you update the multiline codec in the usual way, the patch in this gist is no longer necessary:

bin/logstash-plugin update logstash-codec-multiline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment