Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
To enable IDE (PyCharm) syntax support for Apache Spark, adopted from
import os
import sys
# Set the path for spark installation
# this is the path where you have built spark using sbt/sbt assembly
os.environ['SPARK_HOME'] = "/public/spark-0.9.1"
# os.environ['SPARK_HOME'] = "/home/jie/d2/spark-0.9.1"
# Append to PYTHONPATH so that pyspark could be found
# sys.path.append("/home/jie/d2/spark-0.9.1/python")
# Now we are ready to import Spark Modules
from pyspark import SparkContext
from pyspark import SparkConf
except ImportError as e:
print ("Error importing Spark Modules", e)
import numpy as np
from sklearn.cross_validation import train_test_split, Bootstrap
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn import datasets, svm, pipeline
from sklearn.kernel_approximation import RBFSampler
from sklearn.linear_model import SGDClassifier
if __name__ =='__main__':
# conf.setMaster("local")
conf.set("spark.executor.memory", "12g")
sc = SparkContext(conf=conf)
X, y = make_classification(n_samples=10000, n_features=30, n_classes=2)
X_train, X_test, y_train, y_test = train_test_split(X, y)
samples = sc.parallelize(Bootstrap(y.size))
feature_map_fourier = RBFSampler(gamma=.2, random_state=1)
fourier_approx_svm = pipeline.Pipeline([("feature_map", feature_map_fourier),
("svm", SGDClassifier())])
results = (index, _):[index], y[index]).score(X_test, y_test)) \
.reduce(lambda x,y: x+y)
final_results = results/ len(Bootstrap(y.size))
Copy link

attilacsordas commented Jan 4, 2015

I get a child_exception = [Errno 2] No such file or directory and setting shell = True does not solve the problem

Copy link

blbradley commented Jul 10, 2015

@attilacsordas I'm getting this error also in my own scripts. Have you tried with newer versions of Spark? I get this in 1.4.0.

Copy link

riturajtiwari commented Jul 14, 2015

I have a better way:

  • Create a new Python virtual environment:
  • Go to PyCharm -> Preferences -> Project:
  • On the “Project Interpreter” Line, create a new virtual environment (Click on the gear icon on the right)
  • Once the virtual environment is created, go to the same menu, click “More” and see a list of all project interpreters. Make sure your virtual environment is selected
  • Click “Show paths for the selected interpreter” button at the bottom
  • On the next dialog click on the “+” icon to add paths. You will need to add SPARK_HOME/python and SPARK_HOME/python/lib/

This should set you up to run and debug.

Copy link

ghost commented Jul 31, 2015

@riturajtiwari, thanks your solution is the best so far. Thanks for sharing.
By the way, I am not able to connect to my remote server.
The code is
import os
import sys
from pyspark import SparkContext
from pyspark import SparkConf
print ("Pyspark sucess")
except ImportError as e:
print ("Error importing Spark Modules", e)

#if name =='main':
print ("connection suceeded with Master")
print("Connection not established")
When i run this code, I get the following error
Pyspark sucess
The system cannot find the path specified.
Connection not established

Process finished with exit code 1
Any pointers to resolve this would be appreciated.
Thank you

Copy link

nuTTeLLo commented Sep 2, 2015

@matsya: The path specified error usually occurs when you did not set the 'SPARK_HOME' environment variable properly and you try instantiating the SparkConf(). You should set that to where your remote server spark installation is located.


Copy link

statchaitya commented Jan 18, 2017

Hi folks.

I am able to import SparkContext, but when I try to set it to 'sc', I am getting the following error:

WindowsError: [Error 2] The system cannot find the file specified. It has got something to do with the file in the new environment's lib directory.

Sorry if what I am asking is too trivial, I am not a computer scientist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment