Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Download Google Conceptual Captions Data
#!/usr/bin/bash
# Download split TSV files here https://ai.google.com/research/ConceptualCaptions/download
# create split folodrs val/ and trn/
# run as follows
# cd val/; bash ../download_gcc.sh ../val.tsv
# cd trn/; bash ../download_gcc.sh ../trn.tsv
rm -f .img_file
split_file=$1
idx=0
cut -f2 $split_file > .img_file
while read datum
do
idx=`expr $idx + 1`
echo "wget $datum -O ${idx}.gcc --tries=2" # chose gcc file extension randomly
# There are tons of different file extensions see:
# cut -f2 ../Train_GCC-training.tsv | grep -o '....$' | sort | uniq -c | sort -nk1
wget $datum -O ${idx}.gcc --tries=2
done < .img_file
rm -f .img_file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.