Last active
February 18, 2016 05:09
-
-
Save shawnhermans/aa87056998725395bc95 to your computer and use it in GitHub Desktop.
A script that runs Python-based Spark jobs by bundling requirements defined in a requirements.txt file. It downloads wheel archives which are zip files. I haven't done extensive testing on this yet, but it seems to work.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
TMP_DIR=`mktemp -d` | |
cp spark_test.py ${TMP_DIR} | |
cp requirements.txt ${TMP_DIR} | |
cd ${TMP_DIR} | |
pip wheel -r requirements.txt | |
PY_FILES=$(ls -m *.whl | tr -d ' ' | tr -d '\n') | |
spark-submit --py-files ${PY_FILES} spark_test.py |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Doesn't seem to play well with NumPy and SciPy. Going to do some testing on pure Python packages.