Skip to content

Instantly share code, notes, and snippets.

@serihiro
Last active September 9, 2023 06:06
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save serihiro/7878efa8b148ce2c76000f5f9fed491c to your computer and use it in GitHub Desktop.
Save serihiro/7878efa8b148ce2c76000f5f9fed491c to your computer and use it in GitHub Desktop.
A shell script to count the total size and counts of ImageNet training dataset. For busy people; 1,281,167 images / 146,999,143,316 Byte
#!/bin/bash
BASE_DIR="/path/to/imagenet/train"
ALL_DIRS=`ls $BASE_DIR | grep -v .tar | grep -v .sh`
total_size=0
total_count=0
for dir in $ALL_DIRS
do
target_dir="${BASE_DIR}/${dir}"
tmp_size=`du --bytes "${target_dir}" | awk '{print $1}'`
total_size=$(($total_size+$tmp_size))
tmp_count=`ls "${target_dir}" | wc -l`
total_count=$(($total_count+$tmp_count))
done
echo $total_size
echo $total_count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment