Skip to content

Instantly share code, notes, and snippets.

@codspire
Last active December 2, 2021 03:54
Show Gist options
  • Star 16 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save codspire/7b0955b9e67fe73f6118dad9539cbaa2 to your computer and use it in GitHub Desktop.
Save codspire/7b0955b9e67fe73f6118dad9539cbaa2 to your computer and use it in GitHub Desktop.
Making Zeppelin, Spark, pyspark work on Windows

Zeppelin, Spark, PySpark Setup on Windows (10)

I wish running Zeppelin on windows wasn't as hard as it is. Things go haiwire if you already have Spark installed on your computer. Zeppelin's embedded Spark interpreter does not work nicely with existing Spark and you may need to perform below steps (hacks!) to make it work. I am hoping that these will be fixed in newer Zeppelin versions.

If you try to run Zeppelin after extracting the package, you might encounter "The filename, directory name, or volume label syntax is incorrect."

Google search landed me to https://issues.apache.org/jira/browse/ZEPPELIN-1584, this link was helpful but wasn't enough to get Zeppelin working.

Below is what I had to do to make it work on my Windows 10 computer.

Existing software & configurations:

  • Spark version 2.1.0
  • Python version 3.6.1
  • Zeppelin version zeppelin-0.7.2-bin-all
  • My SPARK_HOME is "C:\Applications\spark-2.1.1-bin-hadoop2.7"

Steps

  • Extract Zeppelin package to a folder (mine was "C:\Applications\zeppelin-0.7.2-bin-all")

  • Copy jars from existing Spark installation into Zeppelin

$ cp %SPARK_HOME%\jars\*.jar  %ZEPPELIN_HOME%\interpreter\spark
$ del %ZEPPELIN_HOME%\interpreter\spark\datanucleus*.jar
  • Copy pyspark from existing Spark installation
$ cp %SPARK_HOME%\python\lib\*.zip  %ZEPPELIN_HOME%\interpreter\spark
  • Rename %ZEPPELIN_HOME%\conf\zeppelin-env.cmd.template to %ZEPPELIN_HOME%\conf\zeppelin-env.cmd
  • Update zeppelin-env.cmd
set PYTHONPATH=%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-0.10.4-src.zip;%SPARK_HOME%\python\lib\pyspark.zip
  • You have to subpress existing Spark installation to make it work nicely with Zeppelin. Add below line on top of %ZEPPELIN_HOME%\bin\zeppelin.cmd file
set SPARK_HOME=
  • Start Zeppelin and validate Spark & pyspark
$ cd %ZEPPELIN_HOME%
$ bin\zeppelin.cmd
%pyspark
a=5*4
print("value = %i" % (a))
sc.version

It should print: value = 20

'2.1.0'

%spark
sc.version

It should print: '2.1.0'

@reflog
Copy link

reflog commented Sep 9, 2017

thanks! this was very helpful, I was wrecking my head trying to make zeppelin to start, and your approach worked!

@mravikrishna
Copy link

Thanks for sharing. Worked for me

@jflight1
Copy link

Worked - thanks!!!

@IshwarBhat
Copy link

This is a lifesaver!

@allixender
Copy link

+1 very strange behaviour, but now works 😄

@evandroc
Copy link

It works here too, but I had to make a change on the process.
After the command

$ cp %SPARK_HOME%\jars*.jar %ZEPPELIN_HOME%\interpreter\spark

I also had to remove the folder %ZEPPELIN_HOME%\interpreter\spark\pyspark

@skd93
Copy link

skd93 commented Feb 14, 2018

Hi,

When you say this, Rename %ZEPPELIN_HOME%\conf\zeppelin-env.cmd.template to %ZEPPELIN_HOME%\conf\zeppelin-env.cmd

Do you mean converting the template file to .cmd file? If yes how can i modify it and add additional texts to it?

@mcdoyaji
Copy link

@skd93 don't need any converting. just remove .template part from the filename for run .cmd on windows console.

@mcdoyaji
Copy link

mcdoyaji commented Jun 12, 2018

@ar1272
Copy link

ar1272 commented Jul 24, 2018

Thanks a bunch!!! This helped a lot!

@juliovg
Copy link

juliovg commented Sep 4, 2018

hello , i follow all your steps and i get the error:

error_zeppelin

can you help me ?

@kiran2531159
Copy link

try installing Zeppelin version - zeppelin-0.7.2-bin-all with all interpreters

@vishrtd
Copy link

vishrtd commented Sep 13, 2018

hello , i follow all your steps and i get the error:

error_zeppelin

can you help me ?

Where have you added the line
set SPARK_HOME=

I faced a similar issue, :-p you have to add it below:

:MAIN
call "%bin%\common.cmd"

See if this helps!

@lbcoker
Copy link

lbcoker commented Oct 15, 2018

I setup as above but still getting issue with zeppelin. Anyone ever seen this?

c:\tools\zeppelin-0.7.3>bin\zeppelin.cmd start
The system cannot find the path specified.

No idea what it is not finding.

@nimmmalarohit
Copy link

I setup as above but still getting issue with zeppelin. Anyone ever seen this?

c:\tools\zeppelin-0.7.3>bin\zeppelin.cmd start
The system cannot find the path specified.

No idea what it is not finding.

You must have set the environment variables incorrectly, remove zepellin_home from environment variables and check once.

@shatestest
Copy link

@shatestest
Copy link

@baram204 I am trying to install zeppelin-0.8.0 version on windows 8 r2 getting below error , any help please
https://stackoverflow.com/questions/54312233/zeppeling-throwing-nullpointerexception-while-configuring

I am working scala with spark 2.3.1 , do I still need pyspark ? Why do I need ?

@samsebk
Copy link

samsebk commented Feb 3, 2019

thank you. Your method works!!!
Also, the recommended Zeppelin version zeppelin-0.7.2-bin-all works, but those versions after 0.7.2. I think there is some error in the Common.cmd

Really appreciate your help.

@mxku
Copy link

mxku commented Mar 13, 2019

yeah. in 0.8.1 common.cmd is broken.

You can fix this by removing curly brace '{' from below part

if defined ZEPPELIN_JMX_ENABLE ( if not defined ZEPPELIN_JMX_PORT ( set ZEPPELIN_JMX_PORT="9996" ) set JMX_JAVA_OPTS=" -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=%ZEPPELIN_JMX_PORT% -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false" set ZEPPELIN_JAVA_OPTS=%JMX_JAVA_OPTS% %ZEPPELIN_JAVA_OPTS% )

At another place a variable is replaced in sh format ${variable.name} change that to cmd format %variable.name%

@darealagung
Copy link

HI,
I already successfully installed and run hadoop, spark, pyspark, jupyter notebook. But with zeppelin 0.8, I am just stuck. Typing zeppelin.cmd gave nothing, immediately goes to prompt again with no error, no nothing. I tried your solution here, with same result. Can somebody help me? Thanks,

@msid
Copy link

msid commented Aug 27, 2019

HI,
I already successfully installed and run hadoop, spark, pyspark, jupyter notebook. But with zeppelin 0.8, I am just stuck. Typing zeppelin.cmd gave nothing, immediately goes to prompt again with no error, no nothing. I tried your solution here, with same result. Can somebody help me? Thanks,

Had the same issue with 0.8.1. Fixed it with replacing curly brace '}' with ')' in bin/common.cmd #77 (a little bit different from what manishonline wrote)

...
if defined ZEPPELIN_JMX_ENABLE (
    if not defined ZEPPELIN_JMX_PORT (
        set ZEPPELIN_JMX_PORT="9996"
    }
...

@darealagung
Copy link

darealagung commented Aug 27, 2019 via email

@tejaswishetty17
Copy link

tejaswishetty17 commented Sep 12, 2019

zeppelin_error2
I followed all the installation steps, but the zeppelin server fails to start. Please check the attached image.

@darealagung
Copy link

darealagung commented Sep 12, 2019 via email

@tejaswishetty17
Copy link

tejaswishetty17 commented Sep 12, 2019 via email

@gitgmontoya
Copy link

You are the savior!!!

@truthelectron
Copy link

nice hack tried it on the zeppelin 0.8.2. could only get the spark to work. but then python did not work. this is a great notebook don't know why it really difficult to get it up and running added to this difficulty there are no proper tutorials also.

@zhongxur
Copy link

Very appreciated for making this notebook! Successfully get Spark work but pyspark/python not working. it reports code = compile(mod, '', 'exec')
TypeError: required field "type_ignores" missing from Module
Thanks again for sharing this notebook!

@zeinramadan
Copy link

I'm getting this error: TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
when executing the pyspark code in zeppelin, any ideas?

@zeinramadan
Copy link

If anyone encounters this issue, replace line 393 in serializers.py with this line:
cls = _old_namedtuple(*args, **kwargs, verbose=False, rename=False, module=None)
Please check this stackoverflow post about this: https://stackoverflow.com/a/42615678/9691413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment