Spark Actions (as of 2.4.1)
There are three kinds of actions:
- view data in the console
- collect data to language-native objects
- write to output data sources
.aggregate(zeroValue)(seqOp, combOp)
.collect
.collect(partialFunction)
.collectAsMap
.count
.countApprox(timeout, counfidence)
.countApproxDistinct(relativeSD)
.countApproxDistinct(p, sp)
.countApproxDistinctByKey(relativeSD[, numPartitions | , partitioner])
.countApproxDistinctByKey(p, sp, partitioner])
.countByKey
.countByKeyApprox(timeout, confidence)
.countByValue
.countByValueApprox
.first
.fold(zeroValue)(op)
.foreach(func)
.forEach
.forEachPartition
.histogram
.keys
.max
.mean
.min
.reduce
.sampleByKeyExact
.sampleVariance
.saveAsTextFile(path)
.saveAsHadoopDataset(conf)
.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass [, conf])
.saveAsSequenceFile(path)
.saveAsObjectFile(path)
.stdev
.sum
.take(n)
.takeOrdered(n[, ordering])
.takeSample(withReplacement, num[, seed])
.top
.treeAggregate
.treeReduce
.values
.variance
Most functions are defined on PairRDDFunctions and many are implemented on other interfaces like JdbcRDD.