Skip to content

Instantly share code, notes, and snippets.

@j16r
Created December 18, 2010 07:59
Show Gist options
  • Save j16r/746288 to your computer and use it in GitHub Desktop.
Save j16r/746288 to your computer and use it in GitHub Desktop.
Quick script to shuffle a directory full of files into a test and training set
require 'fileutils'
# Given a directory of files, sort them randomly then split into a training and
# test set
TEST_PERCENT = 20
if ARGV.size < 1
puts "Usage: #{$0} <path> (outdir)"
exit 0
end
srcdir, outdir = *ARGV
outdir ||= '.'
# Get all the files in the specified src dir
files = Dir.entries(srcdir).collect do |file|
path = File.join(srcdir, file)
path if File::file?(path)
end
# Compact, uniq and shuffle!
files = files.compact.uniq.shuffle
# Now split into two sets
test_files = files[0, files.size * (TEST_PERCENT / 100.0)]
training_files = files - test_files
puts "Test files: #{test_files.inspect}"
puts "Training files: #{training_files.inspect}"
def migrate_files(files, category, outdir)
files.each do |old_file|
new_file = File.join outdir, category, File.basename(old_file)
FileUtils.makedirs File.dirname(new_file)
FileUtils.cp old_file, new_file
end
end
migrate_files test_files, 'testing', outdir
migrate_files training_files, 'training', outdir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment