Skip to content

Instantly share code, notes, and snippets.

@BenFradet
Last active February 2, 2017 13:18
Show Gist options
  • Save BenFradet/05b198df25288431d28fd6d2ec76e0b8 to your computer and use it in GitHub Desktop.
Save BenFradet/05b198df25288431d28fd6d2ec76e0b8 to your computer and use it in GitHub Desktop.
Reading LZO files using elephant-bird in Spark
import com.twitter.elephantbird.mapreduce.input.MultiInputFormat
MultiInputFormat.setClassConf(classOf[Array[Byte]], hadoopConfig)
sc.newAPIHadoopFile[
org.apache.hadoop.io.LongWritable,
com.twitter.elephantbird.mapreduce.io.BinaryWritable[Array[Byte]],
MultiInputFormat[Array[Byte]]
](path)
.map(_._2.get())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment