Skip to content

Instantly share code, notes, and snippets.

@shawnweisfeld
Created May 1, 2020 13:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shawnweisfeld/a62e9c7b35350839c498d9e41c302b84 to your computer and use it in GitHub Desktop.
Save shawnweisfeld/a62e9c7b35350839c498d9e41c302b84 to your computer and use it in GitHub Desktop.
# Use from VM in same region as azure storage account
# your VM will need an attached disk large enough to hold the zip and raw images
# we attached 2 1 TB Premium storage disk.
# https://docs.microsoft.com/en-us/azure/virtual-machines/linux/attach-disk-portal
# sudo chown shawn /datadrive
# create a target storage account where all the files will eventually end up
# I called mine sweisfelopenimgstg
# Now Give your AAD account 'Storage Data Owner' permissions for this account
# create the target container
# Install the latest build of azcopy v10
mkdir bin
cd bin
wget -O azcopy_v10.tar.gz https://aka.ms/downloadazcopy-v10-linux && tar -xf azcopy_v10.tar.gz --strip-components=1
source ~/.profile
rm azcopy_v10.tar.gz
cd ..
# login to azure storage
azcopy login
# lets set the current file we are processing in a variable
export TARFILE="train_6"
# download the file from AWS locally
curl "https://open-images-dataset.s3.amazonaws.com/tar/$TARFILE.tar.gz" >> "/datadrive/$TARFILE.tar.gz"
# decompress the files
tar xvzf "/datadrive/$TARFILE.tar.gz" -C "/datadrive/"
# upload them to our Azure storage account
azcopy cp "/datadrive/$TARFILE" "https://sweisfelopenimgstg.blob.core.windows.net/$TARFILE" --recursive
rm "/datadrive/$TARFILE" -f -r
rm "/datadrive/$TARFILE.tar.gz"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment