Skip to content

Instantly share code, notes, and snippets.

@hugolu
Created December 14, 2016 12:30
Show Gist options
  • Save hugolu/ab322ff9b74843a565c6b67730d45a99 to your computer and use it in GitHub Desktop.
Save hugolu/ab322ff9b74843a565c6b67730d45a99 to your computer and use it in GitHub Desktop.
accumulator practice
Python 3.4.2 (default, Oct 8 2014, 10:45:20)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
16/12/14 12:27:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.0.1
/_/
Using Python version 3.4.2 (default, Oct 8 2014 10:45:20)
SparkSession available as 'spark'.
>>> rdd = sc.parallelize([1,2,3])
>>>
>>> from pyspark.accumulators import AccumulatorParam
>>> class VectorAccumulatorParam(AccumulatorParam):
... def zero(self, value):
... return [0.0] * len(value)
... def addInPlace(self, val1, val2):
... for i in range(len(val1)):
... val1[i] += val2[i]
... return val1
...
>>> va = sc.accumulator([1.0, 2.0, 3.0], VectorAccumulatorParam())
>>>
>>> def g(x):
... global va
... va += [x] * 3
...
>>> rdd.foreach(g)
>>> va.value
[7.0, 8.0, 9.0]
>>>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment