Skip to content

Instantly share code, notes, and snippets.

@daysm
Last active November 5, 2020 09:53
Show Gist options
  • Save daysm/c5079c88f91c253f9c509d6177101c37 to your computer and use it in GitHub Desktop.
Save daysm/c5079c88f91c253f9c509d6177101c37 to your computer and use it in GitHub Desktop.
#!/bin/bash
# Based on https://unix.stackexchange.com/questions/466593/moving-random-files-using-shuf-and-mv-argument-list-too-long
# Based on https://www.gnu.org/software/coreutils/manual/html_node/Random-sources.html#Random-sources
SOURCE_DIR=$1
TARGET_DIR=$2
TEST_SIZE=$3
RANDOM_SEED=$4
get_seeded_random()
{
seed="$1"
echo $seed
openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null
}
mkdir -p $TARGET_DIR
NUM_SOURCE_FILES=$(find $SOURCE_DIR -mindepth 1 -maxdepth 1 ! -name '.*' | grep -c /)
NUM_SAMPLES=$(echo "x = $NUM_SOURCE_FILES*$TEST_SIZE; scale = 0; x / 1" | bc)
echo "Num source files: $NUM_SOURCE_FILES"
echo "Num sample files: $NUM_SAMPLES"
find $SOURCE_DIR -mindepth 1 -maxdepth 1 ! -name '.*' -print0 | shuf --random-source=<(get_seeded_random $RANDOM_SEED) -n $NUM_SAMPLES -z | xargs -0 -I{} mv {} $TARGET_DIR
@daysm
Copy link
Author

daysm commented Oct 20, 2020

Dependencies: bc, shuf
Tested on macOS 10.15.7

This script move a random sample of files from the source directory to the target directory. If the target directory does not exist yet, it will be created.

This can be helpful for splitting a dataset into train and test set.

Usage:

./split_dataset.sh <source_dir> <target_dir> <test_size> <random_seed>

Example:

./split_dataset.sh ~/projects/proj1/images ~/projects/proj1/images_test 0.2 42

This will move a random 20% of files from ~/projects/proj1/images to ~/projects/proj1/images_test.

@daysm
Copy link
Author

daysm commented Nov 5, 2020

A command to split a dataset with multiple classes could look like this:
for dir in class1 class2 class3 class4 class5; do ../scripts/split_dataset.sh train/$dir test/$dir 0.2 42; done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment