jasonnerothin/SparkActions.md

## SparkActions.md

      
    Raw
  

              SparkActions.md
            
          
    Spark Actions (as of 2.4.1)
There are three kinds of actions:

view data in the console
collect data to language-native objects
write to output data sources


.aggregate(zeroValue)(seqOp, combOp)
.collect
.collect(partialFunction)
.collectAsMap
.count
.countApprox(timeout, counfidence)
.countApproxDistinct(relativeSD)
.countApproxDistinct(p, sp)
.countApproxDistinctByKey(relativeSD[, numPartitions | , partitioner])
.countApproxDistinctByKey(p, sp, partitioner])
.countByKey
.countByKeyApprox(timeout, confidence)
.countByValue
.countByValueApprox
.first
.fold(zeroValue)(op)
.foreach(func)
.forEach
.forEachPartition
.histogram
.keys
.max
.mean
.min
.reduce
.sampleByKeyExact
.sampleVariance
.saveAsTextFile(path)
.saveAsHadoopDataset(conf)
.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass [, conf])
.saveAsSequenceFile(path)
.saveAsObjectFile(path)
.stdev
.sum
.take(n)
.takeOrdered(n[, ordering])
.takeSample(withReplacement, num[, seed])
.top
.treeAggregate
.treeReduce
.values
.variance

Most functions are defined on PairRDDFunctions and many are implemented on other interfaces like JdbcRDD.