Skip to content

Instantly share code, notes, and snippets.

@tfrancisl
Last active May 15, 2025 14:00
Show Gist options
  • Save tfrancisl/f2e3ddda38d315c03b1684e1ee58fcb7 to your computer and use it in GitHub Desktop.
Save tfrancisl/f2e3ddda38d315c03b1684e1ee58fcb7 to your computer and use it in GitHub Desktop.
#!/usr/bin/env bash
cat << "END"
A small demo of compatibility issues between the pyspark module and Databricks' databricks-connect module.
You will need:
[poetry](https://python-poetry.org/docs/#installation)
Internet access: we are pulling packages from PyPI.
Bash, or a shell compatible with bash scripts.
Run the demo with bash ps_db_connect_issues.sh and observe the behavior when installing pyspark and databricks-connect.
END
PYPROJECT='[project]
name = "ps-db-connect-issues"
version = "0.1.0"
description = ""
authors = [
{name = "Your Name",email = "you@example.com"}
]
readme = "README.md"
requires-python = ">=3.10,<3.11"
[build-system]
requires = ["poetry-core>=2.0.0,<3.0.0"]
build-backend = "poetry.core.masonry.api"
'
echo "$PYPROJECT" > pyproject.toml
poetry lock > /dev/null
echo ">>> pyspark installed alone <<<"
poetry env use 3.10 > /dev/null
rm -r $(poetry env info --path)
poetry add pyspark
poetry run python -c "import pyspark"
# works fine
sleep 10
rm -r $(poetry env info --path)
echo "$PYPROJECT" > pyproject.toml
poetry lock > /dev/null
echo ">>> pyspark installed with databricks-connect <<<"
poetry add pyspark "databricks-connect@>=15,<16"
# observe all of the overwrite warnings
poetry run python -c "import pyspark"
# ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'
sleep 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment