Skip to content

Instantly share code, notes, and snippets.

@AndreasPetter
AndreasPetter / VersionedHDFSReader.scala
Created September 14, 2015 15:31
Reading from a Scalding or Summingbird genrated HDFS versioned sequence file with JAVA / Scala
/**
* Ever wondered how to read a simple versioned SequenceFile from HDFS with a
* simple program (not being a Job).
*
* Summingbird and Scalding create these sequence files in Batch or HDFS mode.
* I recoomend to add your own conversion stuff to write it to s.th. like CSV.
* This allows to read in the data into R or Excel, for example. However this is
* not designed to work with huge data sets (requires data to be aggregated in the
* job or s.th. else to condense the data if it large). It can however be changed
* easily to do so by directly writing output-data in the read-routine.