>>> rdd = sc.parallelize([1,2,3])
>>> rdd.map(lambda x: [x,x*10])
RDD: [1,2,3] -> [[1,10], [2,20], [3,30]]
>>> rdd.flatMap(lambda x: [x,x*10])
RDD: [1,2,3] -> [[1,10,2,20,3,30]]
Transformation | type |
---|---|
reduceByKey(func) | (V,V) -> V |
sortByKey() | (K,V) -> sorted(K),V) |
groupByKey() | (K,V) -> K, iter(V) |
>>> rdd = sc.parallelize([(1,2),(3,4),(1,32)])
>>> rdd.reduceByKey(lambda x,y: x+y)
((1,34),(3,4))
>>> rdd.sortByKey()
((1,2),(1,32),(3,4))
>>> rdd.groupByKey()
(1,[2,32]),(3,[4]))
RDD.toDebugString()