Skip to content

Instantly share code, notes, and snippets.

@codspire
Last active December 2, 2021 03:54
Show Gist options
  • Star 16 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save codspire/7b0955b9e67fe73f6118dad9539cbaa2 to your computer and use it in GitHub Desktop.
Save codspire/7b0955b9e67fe73f6118dad9539cbaa2 to your computer and use it in GitHub Desktop.
Making Zeppelin, Spark, pyspark work on Windows

Zeppelin, Spark, PySpark Setup on Windows (10)

I wish running Zeppelin on windows wasn't as hard as it is. Things go haiwire if you already have Spark installed on your computer. Zeppelin's embedded Spark interpreter does not work nicely with existing Spark and you may need to perform below steps (hacks!) to make it work. I am hoping that these will be fixed in newer Zeppelin versions.

If you try to run Zeppelin after extracting the package, you might encounter "The filename, directory name, or volume label syntax is incorrect."

Google search landed me to https://issues.apache.org/jira/browse/ZEPPELIN-1584, this link was helpful but wasn't enough to get Zeppelin working.

Below is what I had to do to make it work on my Windows 10 computer.

Existing software & configurations:

  • Spark version 2.1.0
  • Python version 3.6.1
  • Zeppelin version zeppelin-0.7.2-bin-all
  • My SPARK_HOME is "C:\Applications\spark-2.1.1-bin-hadoop2.7"

Steps

  • Extract Zeppelin package to a folder (mine was "C:\Applications\zeppelin-0.7.2-bin-all")

  • Copy jars from existing Spark installation into Zeppelin

$ cp %SPARK_HOME%\jars\*.jar  %ZEPPELIN_HOME%\interpreter\spark
$ del %ZEPPELIN_HOME%\interpreter\spark\datanucleus*.jar
  • Copy pyspark from existing Spark installation
$ cp %SPARK_HOME%\python\lib\*.zip  %ZEPPELIN_HOME%\interpreter\spark
  • Rename %ZEPPELIN_HOME%\conf\zeppelin-env.cmd.template to %ZEPPELIN_HOME%\conf\zeppelin-env.cmd
  • Update zeppelin-env.cmd
set PYTHONPATH=%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-0.10.4-src.zip;%SPARK_HOME%\python\lib\pyspark.zip
  • You have to subpress existing Spark installation to make it work nicely with Zeppelin. Add below line on top of %ZEPPELIN_HOME%\bin\zeppelin.cmd file
set SPARK_HOME=
  • Start Zeppelin and validate Spark & pyspark
$ cd %ZEPPELIN_HOME%
$ bin\zeppelin.cmd
%pyspark
a=5*4
print("value = %i" % (a))
sc.version

It should print: value = 20

'2.1.0'

%spark
sc.version

It should print: '2.1.0'

@zeinramadan
Copy link

If anyone encounters this issue, replace line 393 in serializers.py with this line:
cls = _old_namedtuple(*args, **kwargs, verbose=False, rename=False, module=None)
Please check this stackoverflow post about this: https://stackoverflow.com/a/42615678/9691413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment