Skip to content

Instantly share code, notes, and snippets.

@smartkiwi
Created April 23, 2015 19:01
Show Gist options
  • Save smartkiwi/7489fc509b0ce3ea047d to your computer and use it in GitHub Desktop.
Save smartkiwi/7489fc509b0ce3ea047d to your computer and use it in GitHub Desktop.
import pyspark
import os
os.environ['SPARK_HOME'] = '/root/spark/'
# And Python path
import sys
sys.path.insert(0, '/root/spark/python')
# Detect the PySpark URL
CLUSTER_URL = open('/root/spark-ec2/cluster-url').read().strip()
print CLUSTER_URL
# <codecell>
from pyspark import SparkContext
sc = SparkContext( CLUSTER_URL, 'pyspark')
@srykanth
Copy link

hi, i have been trying to implement pyspark (actually took python program and customized to fit into spark realm by using hdfs to read input data), the program is running locally and doing fine, but somehow its not getting distributed across the clusters. Any clue is appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment