Skip to content

Instantly share code, notes, and snippets.

@girisandeep
Last active May 26, 2021 14:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save girisandeep/4ce968fd961a3ed7c088bb669aa8bba9 to your computer and use it in GitHub Desktop.
Save girisandeep/4ce968fd961a3ed7c088bb669aa8bba9 to your computer and use it in GitHub Desktop.
This example first create sequencefile and then loads it
//Save it
var rdd = sc.parallelize(Array(("key1", 1.0), ("key2", 2.0), ("key3", 3.0)), 2)
rdd.saveAsSequenceFile("pysequencefile1")
//Load it
import org.apache.hadoop.io.DoubleWritable
import org.apache.hadoop.io.Text
val myrdd = sc.sequenceFile(
"pysequencefile1",
classOf[Text], classOf[DoubleWritable])
val result = myrdd.map{case (x, y) => (x.toString, y.get())}
result.collect()
//Array((key1,1.0), (key2,2.0), (key3,3.0))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment