-
-
Save tmcgrath/dd8a0f5fb19201deb65f to your computer and use it in GitHub Desktop.
Welcome to | |
____ __ | |
/ __/__ ___ _____/ /__ | |
_\ \/ _ \/ _ `/ __/ '_/ | |
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0 | |
/_/ | |
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65) | |
Type in expressions to have them evaluated. | |
Type :help for more information. | |
2014-12-02 08:40:25.812 java[2479:1607] Unable to load realm mapping info from SCDynamicStore | |
14/12/02 08:40:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
Spark context available as sc. | |
scala> val babyNamesCSV = sc.parallelize(List(("David", 6), ("Abby", 4), ("David", 5), ("Abby", 5))) | |
babyNamesCSV: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at <console>:12 | |
scala> babyNamesCSV.reduceByKey((n,c) => n + c).collect | |
res0: Array[(String, Int)] = Array((Abby,9), (David,11)) | |
scala> babyNamesCSV.aggregateByKey(0)((k,v) => v.toInt+k, (v,k) => k+v).collect | |
res1: Array[(String, Int)] = Array((Abby,9), (David,11)) |
val products=sc.textFile("/user/cloudera/products")
val productmap=products.map(x=>x.split(",")).map(x=>(x(1).toInt,x(4).toFloat))
productmap.take(5).foreach(println)
(2,59.98)
(2,129.99)
(2,89.99)
(2,89.99)
(2,199.99)
val countandtotal=productmap.aggregateByKey((0,0.0))((x,y)=>(x._1+1,x._2+y),(x,y)=>(x._1+y._1,x._2+y._2))
org.apache.spark.rdd.RDD[(Int, (Int, Double))] = ShuffledRDD[38] at aggregateByKey at :31
countandtotal.take(2).foreach(println)
I want to count number of products under each category id and price under category...When I want to print countandtotal.take(2).foreach(println) then its shows number format exception .Even I changed intial value 0.0 to 0.0f..please help
@matthewadams Good explanation, I thought the same thing.
Note that you could replace
babyNamesCSV.reduceByKey((n,c) => n + c).collect
with
babyNamesCSV.reduceByKey(_ + _).collect
@matthewadams thanks for the clear explanation. yes I am new to spark and i exactly mis interpreted and got confused by the above code.