Skip to content

Instantly share code, notes, and snippets.

@pavlov99
Created February 20, 2016 13:43
Show Gist options
  • Save pavlov99/34836af4fa1d6c2a0dfa to your computer and use it in GitHub Desktop.
Save pavlov99/34836af4fa1d6c2a0dfa to your computer and use it in GitHub Desktop.
sample random lines from file in bash, benchmark
#!/bin/bash
FILENAME="/tmp/random-lines.$$.tmp"
NUMLINES=10000000
seq -f 'line %.0f' $NUMLINES > $FILENAME;
echo "10 random lines with nl:"
$(which time) -v nl -ba $filename | sort -r | sed 's/.*[0-9]\t//' | head > /dev/null
echo "10 random lines with shuf:"
$(which time) -v shuf $FILENAME -n10 | head > /dev/null
echo "10 random lines with rl:"
$(which time) -v rl $FILENAME | head > /dev/null
echo "10 random lines with perl:"
$(which time) -v cat $FILENAME | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);' | head > /dev/null
echo "10 random lines with python:"
$(which time) -v python -c "import random, sys; lines = open(sys.argv[1]).readlines(); random.shuffle(lines); print ''.join(lines[:10])," $FILENAME > /dev/null
rm -rf $FILENAME
@thapakazi
Copy link

Thanks 🙇‍♂️
did some benchmarks over my files... yeah turns out shuf > python > perl > nl

screenshot-20170721-00 16 32

you have a variable name mistake in line 7

$(which time) -v nl -ba $filename | sort -r | sed 's/.*[0-9]\t//' | head > /dev/null

and also i couldnot make time -v work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment