Skip to content

Instantly share code, notes, and snippets.

@bhawna94
Created February 22, 2018 05:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bhawna94/fc3e8b6c3c9ee84d6b2a94bc7ea7c983 to your computer and use it in GitHub Desktop.
Save bhawna94/fc3e8b6c3c9ee84d6b2a94bc7ea7c983 to your computer and use it in GitHub Desktop.
spark assignment
scala> val line = "hello World"
line: String = hello World
scala> val list = List(line)
list: List[String] = List(hello World)
scala> val rdd = sc.parallelize(list)
rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:28
scala> rdd.collect
res0: Array[String] = Array(hello World)
scala> val rdd1 = sc.parallelize(List(1,1,2,3,4,4).distinct)
rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[5] at parallelize at <console>:24
scala> val rdd2 = sc.parallelize(List(1,2,3,4))
rdd2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[6] at parallelize at <console>:24
scala> rdd1.zip(rdd2)
res2: org.apache.spark.rdd.RDD[(Int, Int)] = ZippedPartitionsRDD2[7] at zip at <console>:29
scala> res2.collect
res3: Array[(Int, Int)] = Array((1,1), (2,2), (3,3), (4,4))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment