codspire/making-zeppelin-work-on-windows.md

## making-zeppelin-work-on-windows.md

      
    Raw
  

              making-zeppelin-work-on-windows.md
            
          
    Zeppelin, Spark, PySpark Setup on Windows (10)

I wish running Zeppelin on windows wasn't as hard as it is. Things go haiwire if you already have Spark installed on your computer. Zeppelin's embedded Spark interpreter does not work nicely with existing Spark and you may need to perform below steps (hacks!) to make it work. I am hoping that these will be fixed in newer Zeppelin versions.
If you try to run Zeppelin after extracting the package, you might encounter "The filename, directory name, or volume label syntax is incorrect."
Google search landed me to https://issues.apache.org/jira/browse/ZEPPELIN-1584, this link was helpful but wasn't enough to get Zeppelin working.
Below is what I had to do to make it work on my Windows 10 computer.
Existing software & configurations:

Spark version 2.1.0
Python version 3.6.1
Zeppelin version zeppelin-0.7.2-bin-all
My SPARK_HOME is "C:\Applications\spark-2.1.1-bin-hadoop2.7"

Steps


Extract Zeppelin package to a folder (mine was  "C:\Applications\zeppelin-0.7.2-bin-all")


Copy jars from existing Spark installation into Zeppelin


$ cp %SPARK_HOME%\jars\*.jar  %ZEPPELIN_HOME%\interpreter\spark
$ del %ZEPPELIN_HOME%\interpreter\spark\datanucleus*.jar

Copy pyspark from existing Spark installation

$ cp %SPARK_HOME%\python\lib\*.zip  %ZEPPELIN_HOME%\interpreter\spark

Rename %ZEPPELIN_HOME%\conf\zeppelin-env.cmd.template to %ZEPPELIN_HOME%\conf\zeppelin-env.cmd
Update zeppelin-env.cmd

set PYTHONPATH=%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-0.10.4-src.zip;%SPARK_HOME%\python\lib\pyspark.zip

You have to subpress existing Spark installation to make it work nicely with Zeppelin. Add below line on top of %ZEPPELIN_HOME%\bin\zeppelin.cmd file

set SPARK_HOME=

Start Zeppelin and validate Spark & pyspark

$ cd %ZEPPELIN_HOME%
$ bin\zeppelin.cmd

Open http://localhost:8080 and create a new notebook and try below code
Validate pyspark

%pyspark
a=5*4
print("value = %i" % (a))
sc.version
It should print:
value = 20
'2.1.0'
%spark
sc.version
It should print:
'2.1.0'