Skip to content

Instantly share code, notes, and snippets.

@zhukovgreen
Last active January 25, 2024 17:36
Show Gist options
  • Save zhukovgreen/b29e9843881dd859ba0ecd53eada8634 to your computer and use it in GitHub Desktop.
Save zhukovgreen/b29e9843881dd859ba0ecd53eada8634 to your computer and use it in GitHub Desktop.
Pointing multiple spark versions in the same project

Pointing multiple spark versions in the same project

  • databrics-connect:
conda create --name databricks-connect python=3.10 -y
conda activate databricks-connect
pip install databricks-connect==13.*
export _SITE="$(python -c 'import sys;print(sys.path[-1])')"
export SPARK_HOME="$_SITE/pyspark"
export PYSPARK_PYTHON="$(which python)"
export PYSPARK_DRIVER_PYTHON=$PYSPARK_PYTHON
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$_SITE:$SPARK_HOME/python/lib/*.zip:$PYTHONPATH

echo "SPARK_HOME=${SPARK_HOME};PYTHONPATH=${PYTHONPATH};PATH=${PATH}"
# paste to the pycharm run configuration environment
  • local spark
$ conda deactivate
export SPARK_HOME=$HOME/spark-3.4/spark-3.4.1-bin-hadoop3
export PYSPARK_PYTHON=$(which python)
export PYSPARK_DRIVER_PYTHON=$PYSPARK_PYTHON

export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/*.zip:$PYTHONPATH

echo "SPARK_HOME=${SPARK_HOME};PYTHONPATH=${PYTHONPATH};PATH=${PATH}"
# paste to the pycharm run configuration environment
@pyvel26
Copy link

pyvel26 commented Jan 17, 2024

This is what I'm attempting to do. I'm using a Windows 10 laptop.

@zhukovgreen
Copy link
Author

Hey @pyvel26

:: Setting up Databricks Connect with a specific version of Spark
conda create --name databricks-connect python=3.10 -y
conda activate databricks-connect
pip install databricks-connect==13.*

:: Using Python to get the site-packages directory
for /f "delims=" %%a in ('python -c "import site; print(site.getsitepackages()[-1])"') do set "_SITE=%%a"

:: Setting up environment variables
set SPARK_HOME=%_SITE%\pyspark
set PYSPARK_PYTHON=python
set PYSPARK_DRIVER_PYTHON=%PYSPARK_PYTHON%
set PATH=%SPARK_HOME%\bin;%PATH%
set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH%
set PYTHONPATH=%_SITE%;%SPARK_HOME%\python\lib\py4j-*.zip;%PYTHONPATH%

echo "SPARK_HOME=%SPARK_HOME%;PYTHONPATH=%PYTHONPATH%;PATH=%PATH%"
:: Paste to the PyCharm run configuration environment

For the local Spark setup:

:: Assuming you have deactivated the previous environment
:: Define the installation directory of your Spark version (modify according to your actual path)
set SPARK_HOME=%USERPROFILE%\spark-3.4\spark-3.4.1-bin-hadoop3
set PYSPARK_PYTHON=python
set PYSPARK_DRIVER_PYTHON=%PYSPARK_PYTHON%

set PATH=%SPARK_HOME%\bin;%PATH%
set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH%
set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-*.zip;%PYTHONPATH%

echo "SPARK_HOME=%SPARK_HOME%;PYTHONPATH=%PYTHONPATH%;PATH=%PATH%"
:: Paste to the PyCharm run configuration environment

Please note:

  • This script assumes that you've installed Spark in the directory %USERPROFILE%\spark-3.4\spark-3.4.1-bin-hadoop3. You may need to adjust the path to match your Spark installation directory.
  • The %USERPROFILE% environment variable corresponds to the home directory of the current user. This is analogous to $HOME on Unix-like systems.

@pyvel26
Copy link

pyvel26 commented Jan 25, 2024

I realize that our team is using Databricks runtime 10.4. I don't think its supported any longer by databricks connect. I'm assuming you're using 13

@zhukovgreen
Copy link
Author

Yes, we use 13 LTS version. Not sure how steps will divert due to the runtime change. Technically it might work, but you need to try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment