Skip to content

Instantly share code, notes, and snippets.

Created June 22, 2016 17:58
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
What would you like to do?
import org.apache.crunch.Source
import org.apache.avro.mapred.AvroKey
object SparkRunner {
def readInput[S <: SpecificRecord: ClassTag](spark: SparkContext, source: Source[S]): RDD[S] = {
val job = Job.getInstance(spark.hadoopConfiguration)
source.configureSource(job, -1)
val input = spark.newAPIHadoopRDD(
classOf[CrunchInputFormat[AvroKey[S], NullWritable]],
classOf[NullWritable]) => x._1.datum()).map(x => SpecificData.get().deepCopy(x.getSchema, x))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment