Skip to content

Instantly share code, notes, and snippets.

@ikegami-yukino
Last active July 31, 2018 03:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ikegami-yukino/5146520 to your computer and use it in GitHub Desktop.
Save ikegami-yukino/5146520 to your computer and use it in GitHub Desktop.
ファイルを行ごとにランダムにソートして、指定した数のファイルに分割する。randomly sort and split given file in given number.
#!/bin/bash
# usage:
# ./randomsplit.sh [FILE] [division number]
#
# Check the number of parameters
if [ $# -ne 2 ]; then
echo "usage: ./randomsplit.sh [FILE] [division number]" 1>&2
exit 1
fi
file=`echo $1` # filename
fold=`echo $2` # division number
# Randomsort for given file
cat $file | awk 'BEGIN { srand() } { print rand() "\t" $0 }' | sort -n | cut -f 2- > tempfile
# Count lines of given file
line_count=`wc -l $file|sed 's/^ *//g'|cut -f 1 -d ' '`
# Split randomized file
split -l `expr $line_count / $fold` tempfile splited
# Add remained lines to each file
remainder=`expr $line_count % $fold`
if `test $remainder -ne 0`; then
for i in `seq 1 $remainder`
do
writefile=`ls | grep 'splited' | head -n $i| tail -n 1`
tail -n $i tempfile | head -n 1 >> $writefile
done
fi
# Delete unnecessary files
filecount=0
for i in `ls | grep 'splited'`
do
if `test $filecount -lt $fold`; then
filecount=`expr $filecount + 1`
else
rm $i
fi
done
rm tempfile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment