Implement a river for indexing CSV files, in the running ElasticSearch 0.90.1 instance.
As listed in ES' plugins page, ES-CSV-River seemed to be able to do the job easily.
- To Site, or Not To Site: it wasn't possible to install CSV-River the usual way (
$ bin/plugin -install xxBedy/elasticsearch-river-csv
). ES plugin-system was detecting it to be a site-plugin, but for having the Java files in it, it was confused & aborted plugin-installations. - Call 911 (Maven, that is): Got JAR files built from the source with Maven & installed it - that seemed to work!
- Processing, I see: Now that I create the river, with sample JSON fit to work with the sample CSV I generated - it seems to start indexing the file, but breaking just after. It adds the
.processing
to the file's extension & fails, reporting traceback for Exception in opencsv.
Exception in thread "elasticsearch[Recorder][CSV processor][T#1]" java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
at org.elasticsearch.river.csv.CSVRiver$CSVConnector.processFile(CSVRiver.java:232)
at org.elasticsearch.river.csv.CSVRiver$CSVConnector.run(CSVRiver.java:193)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 3 more
- CSV River
README
mentions its compatibility with ES 0.19.x - I'm using 0.90.1 - Maven build is somehow not exactly what it's supposed to be.
- Setting up my personal Maven environment to wrangle with building the plugin from source
- Replace the OpenCSV's CSV parser class with something custom.
- Write a whole-new CSV-River plugin, to scratch my own itch.
David Pilato stepped in with a solution - that fixed the issue.
Apparently, I was doing it wrong to try to install the plugin from Maven generated JAR file. The ZIP builds in releases had the opencsv.jar
dependency bundled into it. Previously, OpenCSV was being required by the plugin, but the indexing was failing for not having it linked.
This also worked for me: