Skip to content

Instantly share code, notes, and snippets.

@hn5092
hn5092 / MultipleOutputsExample.scala
Created February 26, 2016 08:42 — forked from mlehman/MultipleOutputsExample.scala
Hadoop MultipleOutputs on Spark Example
/* Example using MultipleOutputs to write a Spark RDD to multiples files.
Based on saveAsNewAPIHadoopFile implemented in org.apache.spark.rdd.PairRDDFunctions, org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil.
val values = sc.parallelize(List(
("fruit/items", "apple"),
("vegetable/items", "broccoli"),
("fruit/items", "pear"),
("fruit/items", "peach"),
("vegetable/items", "celery"),
("vegetable/items", "spinach")