Created
April 10, 2015 20:59
-
-
Save koverholt/a2cc2a0ab51acb13ae57 to your computer and use it in GitHub Desktop.
Simple Numpy example in Spark
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
from pyspark import SparkContext | |
from pyspark import SparkConf | |
conf = SparkConf() | |
conf.setMaster("spark://<HOSTNAME>:7077") | |
conf.setAppName("NumpyMult") | |
sc = SparkContext(conf=conf) | |
def mult(x): | |
y = np.array([2]) | |
return x*y | |
x = np.arange(10000) | |
distData = sc.parallelize(x) | |
results = distData.map(mult).collect() | |
print results |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
Thanks for the code snippet. I was wondering if just like distData, we can have another distData2 and do operations on both of them together?
To be more precise:
x = np.arrange(10000)
distData = sc.parallelize(x)
y = np.arrange(10000)
distData2 = sc.parallelize(y)
Now do array operations on both disData and distData2. Is this possible?
Thanks
Venkata D.