Skip to content

Instantly share code, notes, and snippets.

@Jorricks
Last active November 30, 2023 15:31
Show Gist options
  • Save Jorricks/1bed0f677b0630578a67837a7084201c to your computer and use it in GitHub Desktop.
Save Jorricks/1bed0f677b0630578a67837a7084201c to your computer and use it in GitHub Desktop.
Downloading files from HDFS through zipping and Jupyterhub
# Check the files are as expected
!hdfs dfs -ls /user/jorrick/my_file_path/
# Setup
!mkdir /tmp/jorrick
# Create the subdirectory and make sure it has the correct permissions
!mkdir -m 700 /tmp/jorrick/my_file_path
!ls -ll /tmp/jorrick/my_file_path
# Copy the files over to local file system
!hdfs dfs -copyToLocal /user/jorrick/my_file_path/ /tmp/jorrick/my_file_path
# Zip everything, make sure you use enough stars :)
!zip /tmp/jorrick/my_file_path.zip /tmp/jorrick/my_file_path/*
!zip /tmp/jorrick/my_file_path.zip /tmp/jorrick/my_file_path/*/*
!zip /tmp/jorrick/my_file_path.zip /tmp/jorrick/my_file_path/*/*/*
# Verify your zip included the files by checking the file size
!ls -ll /tmp/jorrick/my_file_path.zip
# Upload the file back onto HDFS so we can then download it through the Jupyterhub interface
!hdfs dfs -put /tmp/jorrick/my_file_path.zip /user/jorrick/notebooks/
# Delete the tmp folder
!rm -rf /tmp/jorrick
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment