Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save tmcgrath/2855edd9d7cf8d1d2265 to your computer and use it in GitHub Desktop.
Save tmcgrath/2855edd9d7cf8d1d2265 to your computer and use it in GitHub Desktop.
Scala based Spark Transformations Part 2
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
2015-12-10 13:34:39.604 java[51901:1203] Unable to load realm info from SCDynamicStore
15/12/10 13:34:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context available as sc.
scala> val parallel = sc.parallelize(1 to 9)
parallel: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:12
scala> val par2 = sc.parallelize(5 to 15)
par2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at parallelize at <console>:12
scala> parallel.union(par2).collect
res0: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
scala> parallel.union(par2)
res1: org.apache.spark.rdd.RDD[Int] = UnionRDD[3] at union at <console>:17
scala> parallel.intersection(par2).collect
res2: Array[Int] = Array(8, 9, 5, 6, 7)
scala> parallel.union(par2).distinct.collect
res3: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment