Skip to content

Instantly share code, notes, and snippets.

@jasonnerothin
Last active April 21, 2019 22:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jasonnerothin/b7c3e219bdb3a9b92c1759aeeec0b720 to your computer and use it in GitHub Desktop.
Save jasonnerothin/b7c3e219bdb3a9b92c1759aeeec0b720 to your computer and use it in GitHub Desktop.
Spark Action Summary

Spark Actions (as of 2.4.1)

There are three kinds of actions:

  • view data in the console
  • collect data to language-native objects
  • write to output data sources

.aggregate(zeroValue)(seqOp, combOp)
.collect
.collect(partialFunction)
.collectAsMap
.count
.countApprox(timeout, counfidence)
.countApproxDistinct(relativeSD)
.countApproxDistinct(p, sp)
.countApproxDistinctByKey(relativeSD[, numPartitions | , partitioner])
.countApproxDistinctByKey(p, sp, partitioner])
.countByKey
.countByKeyApprox(timeout, confidence)
.countByValue
.countByValueApprox
.first
.fold(zeroValue)(op)
.foreach(func)
.forEach
.forEachPartition
.histogram
.keys
.max
.mean
.min
.reduce
.sampleByKeyExact
.sampleVariance
.saveAsTextFile(path)
.saveAsHadoopDataset(conf)
.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass [, conf])
.saveAsSequenceFile(path)
.saveAsObjectFile(path)
.stdev
.sum
.take(n)
.takeOrdered(n[, ordering])
.takeSample(withReplacement, num[, seed])
.top
.treeAggregate
.treeReduce
.values
.variance

Most functions are defined on PairRDDFunctions and many are implemented on other interfaces like JdbcRDD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment