Tested with
Python 2.7, OS X 10.11.3 El Capitan, Apache Spark 1.6.0 & Hadoop 2.6
Download Apache Spark and build it or download the pre-built version.
I suggest to download the pre-built version with Hadoop 2.6.
Download and install Anaconda.
Once you have installed Anaconda open your terminal and type
conda install jupyter
conda update jupyter
Open terminal and type
echo "export PATH=$PATH:/path_to_downloaded_spark/spark-1.6.0/bin" >> .profile
echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profile
Now you can source it to make changes available in this terminal
source .profile
or Cmd+Q
your terminal and reopen it.
Now, using your terminal, go in whatever folder you want and type pyspark
. For example
cd Documents/my_spark_folder
pyspark
Now the IPython notebook should open in your browser.
To check whether Spark is correctly linked create a new Python 2
file inside IPython Notebook, type sc
and run that line.
You should see something like this
In [1]: sc
Out[1]: <pyspark.context.SparkContext at 0x1049bdf90>
Hello sir I set both
PYSPARK_DRIVER_PYTHON=ipython
andPYSPARK_DRIVER_PYTHON_OPTS=notebook
in environment variables but when i wrote this commandPYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS=notebook spark-1.6.1-bin-hadoop2.6\bin\pyspark --packages com.databricks:spark-csv_2.11:1.4.0 --master local[*]
to start notebook it gives error it is not recognized as internal or external command. what should i do to make it works? i want to load spark-csv into my notebook.