Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ResidentMario/e0ed6bc89c78c31e28e1485de37645cc to your computer and use it in GitHub Desktop.
Save ResidentMario/e0ed6bc89c78c31e28e1485de37645cc to your computer and use it in GitHub Desktop.
echo "Installing base packages"
conda install \
-c conda-forge \
-y \
-q \
dask-yarn>=0.7.0 \
pyarrow \
s3fs \
conda-pack \
tornado=5 \
$EXTRA_CONDA_PACKAGES
# -----------------------------------------------------------------------------
# 4. Package the environment to be distributed to worker nodes
# -----------------------------------------------------------------------------
echo "Packaging environment"
conda pack -q -o $HOME/environment.tar.gz
# -----------------------------------------------------------------------------
# 6. Configure Dask
#
# This isn't necessary, but for this particular bootstrap script it will make a
# few things easier:
#
# - Configure the cluster's dashboard link to show the proxied version through
# jupyter-server-proxy. This allows access to the dashboard with only an ssh
# tunnel to the notebook.
#
# - Specify the pre-packaged python environment, so users don't have to
#
# - Set the default deploy-mode to local, so the dashboard proxying works
#
# - Specify the location of the native libhdfs library so pyarrow can find it
# on the workers and the client (if submitting applications).
# ------------------------------------------------------------------------------
echo "Configuring Dask"
mkdir -p $HOME/.config/dask
cat <<EOT >> $HOME/.config/dask/config.yaml
distributed:
dashboard:
link: "/proxy/{port}/status"
yarn:
environment: /home/hadoop/environment.tar.gz
deploy-mode: local
worker:
env:
ARROW_LIBHDFS_DIR: /usr/lib/hadoop/lib/native/
client:
env:
ARROW_LIBHDFS_DIR: /usr/lib/hadoop/lib/native/
EOT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment