Skip to content

Instantly share code, notes, and snippets.

@zomglings
Last active September 23, 2019 10:08
Show Gist options
  • Save zomglings/29fee3507d8fcce7eca037f1b76d2cf2 to your computer and use it in GitHub Desktop.
Save zomglings/29fee3507d8fcce7eca037f1b76d2cf2 to your computer and use it in GitHub Desktop.
Upload Stanford Dogs dataset to S3 and register against Simiotics Data Registry
#!/usr/bin/env bash
STANFORD_DOG_IMAGES_DIR=${STANFORD_DOG_IMAGES_DIR:-~/data/stanford-dogs/Images}
DOG_DIRS=$(ls -1 $STANFORD_DOG_IMAGES_DIR)
BATCH_SIZE=${BATCH_SIZE:-100}
PARALLELISM=${PARALLELISM:-0}
SIMIOTICS_SOURCE=${SIMIOTICS_SOURCE}
if [ -z "$SIMIOTICS_SOURCE" ]; then
echo "ERROR: SIMIOTICS_SOURCE environment variable must be defined"
exit 1
fi
for dog_dir in ${DOG_DIRS[@]}; do
breed_tag=$(echo $dog_dir | awk -F- '{OFS="-"; $1=""; print $0}' | sed 's/^-//')
ls -1 $STANFORD_DOG_IMAGES_DIR/$dog_dir/* | xargs -n${BATCH_SIZE} -P${PARALLELISM} simiotics_s3 data register -s $SIMIOTICS_SOURCE -t breed=$breed_tag
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment