Skip to content

Instantly share code, notes, and snippets.

@mkocabas
Created April 9, 2018 09:41
Show Gist options
  • Save mkocabas/a6177fc00315403d31572e17700d7fd9 to your computer and use it in GitHub Desktop.
Save mkocabas/a6177fc00315403d31572e17700d7fd9 to your computer and use it in GitHub Desktop.
Download COCO dataset. Run under 'datasets' directory.
mkdir coco
cd coco
mkdir images
cd images
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/zips/test2017.zip
wget http://images.cocodataset.org/zips/unlabeled2017.zip
unzip train2017.zip
unzip val2017.zip
unzip test2017.zip
unzip unlabeled2017.zip
rm train2017.zip
rm val2017.zip
rm test2017.zip
rm unlabeled2017.zip
cd ../
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
wget http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
wget http://images.cocodataset.org/annotations/image_info_test2017.zip
wget http://images.cocodataset.org/annotations/image_info_unlabeled2017.zip
unzip annotations_trainval2017.zip
unzip stuff_annotations_trainval2017.zip
unzip image_info_test2017.zip
unzip image_info_unlabeled2017.zip
rm annotations_trainval2017.zip
rm stuff_annotations_trainval2017.zip
rm image_info_test2017.zip
rm image_info_unlabeled2017.zip
@ben-xD
Copy link

ben-xD commented Aug 31, 2020

Although this script is convenient, when using a cloud VM to download these files, you can potentially save a bit of time by running all the downloads in separate shells. I found that running 1 at once gave me 22.4/s for it, 4 at once gave me about 22.4MB/s each.

@buttercutter
Copy link

just a small recommendation, try to use wget -c instead of just wget in order to allow resume download for partially downloaded file

@CyprienRicque
Copy link

CyprienRicque commented Nov 16, 2021

The exact same script but with the modification proposed by @buttercutter

mkdir coco
cd coco
mkdir images
cd images

wget -c http://images.cocodataset.org/zips/train2017.zip
wget -c http://images.cocodataset.org/zips/val2017.zip
wget -c http://images.cocodataset.org/zips/test2017.zip
wget -c http://images.cocodataset.org/zips/unlabeled2017.zip

unzip train2017.zip
unzip val2017.zip
unzip test2017.zip
unzip unlabeled2017.zip

rm train2017.zip
rm val2017.zip
rm test2017.zip
rm unlabeled2017.zip 

cd ../
wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
wget -c http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
wget -c http://images.cocodataset.org/annotations/image_info_test2017.zip
wget -c http://images.cocodataset.org/annotations/image_info_unlabeled2017.zip

unzip annotations_trainval2017.zip
unzip stuff_annotations_trainval2017.zip
unzip image_info_test2017.zip
unzip image_info_unlabeled2017.zip

rm annotations_trainval2017.zip
rm stuff_annotations_trainval2017.zip
rm image_info_test2017.zip
rm image_info_unlabeled2017.zip

@viethoang303
Copy link

how to download coco dataset for segmentation? Please help me

@bit-scientist
Copy link

bit-scientist commented May 20, 2022

The exact same script but with the modification proposed by @buttercutter and @ben-xD

First open three separate shells and lay out them for convenient use:

On the first one do the following: (you may need to use sudo if permission denied error appears )

mkdir coco
cd coco
mkdir images
cd images
wget -c http://images.cocodataset.org/zips/train2017.zip

On the second:

cd coco/images/
wget -c http://images.cocodataset.org/zips/val2017.zip
wget -c http://images.cocodataset.org/zips/test2017.zip

Note that you will need to press Enter on shell 2 to download test2017.zip after it finishes val2017.zip

On the third:

cd coco/images/
wget -c http://images.cocodataset.org/zips/unlabeled2017.zip

Wait a little while (or do some five-minute stretching 😄 ) until processes on shells one and two finish.

Back on the first shell, issue the following:

unzip train2017.zip
unzip val2017.zip
unzip test2017.zip
unzip unlabeled2017.zip

rm train2017.zip
rm val2017.zip
rm test2017.zip
rm unlabeled2017.zip 

On the second shell:

cd ../
wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
wget -c http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
wget -c http://images.cocodataset.org/annotations/image_info_test2017.zip
wget -c http://images.cocodataset.org/annotations/image_info_unlabeled2017.zip

unzip annotations_trainval2017.zip
unzip stuff_annotations_trainval2017.zip
unzip image_info_test2017.zip
unzip image_info_unlabeled2017.zip

rm annotations_trainval2017.zip
rm stuff_annotations_trainval2017.zip
rm image_info_test2017.zip
rm image_info_unlabeled2017.zip

By the time shells one and two finish, the shell three will have finished its job. Hope it helps save some time. 🤝

@PushpakBhoge
Copy link

same thing just added ! in front of everything in case you want to run on jupyter

!mkdir coco
!cd coco
!mkdir images
!cd images

!wget -c http://images.cocodataset.org/zips/train2017.zip
!wget -c http://images.cocodataset.org/zips/val2017.zip
!wget -c http://images.cocodataset.org/zips/test2017.zip
!wget -c http://images.cocodataset.org/zips/unlabeled2017.zip

!unzip train2017.zip
!unzip val2017.zip
!unzip test2017.zip
!unzip unlabeled2017.zip

!rm train2017.zip
!rm val2017.zip
!rm test2017.zip
!rm unlabeled2017.zip 

!cd ../
!wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
!wget -c http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
!wget -c http://images.cocodataset.org/annotations/image_info_test2017.zip
!wget -c http://images.cocodataset.org/annotations/image_info_unlabeled2017.zip

!unzip annotations_trainval2017.zip
!unzip stuff_annotations_trainval2017.zip
!unzip image_info_test2017.zip
!unzip image_info_unlabeled2017.zip

!rm annotations_trainval2017.zip
!rm stuff_annotations_trainval2017.zip
!rm image_info_test2017.zip
!rm image_info_unlabeled2017.zip

@tbwxmu
Copy link

tbwxmu commented Jul 31, 2022

a HA
the same thing just added %%bash on the first line of the cell in case you want to run on Jupyter

%%bash

mkdir coco
cd coco
mkdir images
cd images

wget -c http://images.cocodataset.org/zips/train2017.zip
wget -c http://images.cocodataset.org/zips/val2017.zip
wget -c http://images.cocodataset.org/zips/test2017.zip
wget -c http://images.cocodataset.org/zips/unlabeled2017.zip

unzip train2017.zip
unzip val2017.zip
unzip test2017.zip
unzip unlabeled2017.zip

rm train2017.zip
rm val2017.zip
rm test2017.zip
rm unlabeled2017.zip 

cd ../
wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
wget -c http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
wget -c http://images.cocodataset.org/annotations/image_info_test2017.zip
wget -c http://images.cocodataset.org/annotations/image_info_unlabeled2017.zip

unzip annotations_trainval2017.zip
unzip stuff_annotations_trainval2017.zip
unzip image_info_test2017.zip
unzip image_info_unlabeled2017.zip

rm annotations_trainval2017.zip
rm stuff_annotations_trainval2017.zip
rm image_info_test2017.zip
rm image_info_unlabeled2017.zip

@AlessandroMondin
Copy link

The exact same script but with the modification proposed by @buttercutter and @ben-xD

First open three separate shells and lay out them for convenient use:

On the first one do the following: (you may need to use sudo if permission denied error appears )

mkdir coco
cd coco
mkdir images
cd images
wget -c http://images.cocodataset.org/zips/train2017.zip

On the second:

cd coco/images/
wget -c http://images.cocodataset.org/zips/val2017.zip
wget -c http://images.cocodataset.org/zips/test2017.zip

Note that you will need to press Enter on shell 2 to download test2017.zip after it finishes val2017.zip

On the third:

cd coco/images/
wget -c http://images.cocodataset.org/zips/unlabeled2017.zip

Wait a little while (or do some five-minute stretching 😄 ) until processes on shells one and two finish.

Back on the first shell, issue the following:

unzip train2017.zip
unzip val2017.zip
unzip test2017.zip
unzip unlabeled2017.zip

rm train2017.zip
rm val2017.zip
rm test2017.zip
rm unlabeled2017.zip 

On the second shell:

cd ../
wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
wget -c http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
wget -c http://images.cocodataset.org/annotations/image_info_test2017.zip
wget -c http://images.cocodataset.org/annotations/image_info_unlabeled2017.zip

unzip annotations_trainval2017.zip
unzip stuff_annotations_trainval2017.zip
unzip image_info_test2017.zip
unzip image_info_unlabeled2017.zip

rm annotations_trainval2017.zip
rm stuff_annotations_trainval2017.zip
rm image_info_test2017.zip
rm image_info_unlabeled2017.zip

By the time shells one and two finish, the shell three will have finished its job. Hope it helps save some time. 🤝

Great! Thanks, the only think that might further improve it would be to rm zip file right after having unzipped them. Might be useful for all the people that while working on VMs might have memory constraints

@bit-scientist
Copy link

The exact same script but with the modification proposed by @buttercutter, @ben-xD and @AlessandroMondin (memory constraint)

First open three separate shells and lay out them for convenient use:

On the first one do the following: (you may need to use sudo if permission denied error appears )

mkdir coco
cd coco
mkdir images
cd images
wget -c http://images.cocodataset.org/zips/train2017.zip

On the second:

cd coco/images/
wget -c http://images.cocodataset.org/zips/val2017.zip
wget -c http://images.cocodataset.org/zips/test2017.zip

Note that you will need to press Enter on shell 2 to download test2017.zip after it finishes val2017.zip

On the third:

cd coco/images/
wget -c http://images.cocodataset.org/zips/unlabeled2017.zip

Wait a little while (or do some five-minute stretching 😄 ) until processes on shells one and two finish.

Back on the first shell, issue the following:

unzip train2017.zip
unzip val2017.zip
unzip test2017.zip
unzip unlabeled2017.zip

rm train2017.zip
rm val2017.zip
rm test2017.zip
rm unlabeled2017.zip 

On the second shell:

cd ../
wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
wget -c http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
wget -c http://images.cocodataset.org/annotations/image_info_test2017.zip
wget -c http://images.cocodataset.org/annotations/image_info_unlabeled2017.zip

unzip annotations_trainval2017.zip
rm annotations_trainval2017.zip

unzip stuff_annotations_trainval2017.zip
rm stuff_annotations_trainval2017.zip

unzip image_info_test2017.zip
rm image_info_test2017.zip

unzip image_info_unlabeled2017.zip
rm image_info_unlabeled2017.zip

By the time shells one and two finish, the shell three will have finished its job. Hope it helps save some time. 🤝

@M0E313
Copy link

M0E313 commented Mar 18, 2024

same script but instead of 5 min stretch, try doing squats.

@shwu-nyunai
Copy link

Everything but Claude Haiku-fied with <script> + separate and run them in parallel (maybe background processes)

# Create the coco directory and cd into it
mkdir coco
cd coco

# Create the images directory and cd into it
mkdir images
cd images

# Download the dataset zip files in parallel
wget http://images.cocodataset.org/zips/train2017.zip &
wget http://images.cocodataset.org/zips/val2017.zip &
wget http://images.cocodataset.org/zips/test2017.zip &
wget http://images.cocodataset.org/zips/unlabeled2017.zip &
wait

# Unzip the dataset zip files in parallel
unzip train2017.zip &
unzip val2017.zip &
unzip test2017.zip &
unzip unlabeled2017.zip &
wait

# Remove the zip files
rm train2017.zip
rm val2017.zip
rm test2017.zip
rm unlabeled2017.zip

# Go back to the coco directory
cd ../

# Download the annotation zip files in parallel
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip &
wget http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip &
wget http://images.cocodataset.org/annotations/image_info_test2017.zip &
wget http://images.cocodataset.org/annotations/image_info_unlabeled2017.zip &
wait

# Unzip the annotation zip files in parallel
unzip annotations_trainval2017.zip &
unzip stuff_annotations_trainval2017.zip &
unzip image_info_test2017.zip &
unzip image_info_unlabeled2017.zip &
wait

# Remove the zip files
rm annotations_trainval2017.zip
rm stuff_annotations_trainval2017.zip
rm image_info_test2017.zip
rm image_info_unlabeled2017.zip

@sorenwacker
Copy link

Lol, the script downloaeded files, and then deleted them right away, because unzip was not found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment