Skip to content

Instantly share code, notes, and snippets.

@tixxit
Created February 1, 2012 22:23
Show Gist options
  • Save tixxit/1719848 to your computer and use it in GitHub Desktop.
Save tixxit/1719848 to your computer and use it in GitHub Desktop.
Bash script to output a fixed number of random lines from a file.
#!/bin/sh
#
# pluck - Pluck some random lines from a file.
#
# This simple tool chooses a fixed number of random lines from a file and
# outputs them to stdout (in order of their line number).
#
function usage {
echo "usage:\t`basename $0` filename [k=100]"
}
#
# Samples from the set of numbers 0..n, with replacement, k times. Requires 2
# arguments:
# n The total number of numbers.
# k The number of samples to take.
#
function sample {
for (( COUNTER=1; COUNTER<=$2; COUNTER+=1 )); do
echo `expr $RANDOM \% $1`
done | sort -b -n
}
if [ -z "$1" ]; then
usage
exit 1
fi
file=$1
n=`cat $file | wc -l`
k=100
if [ -n "$2" ]; then
k="$2"
fi
sed = $file | sed "N;s/\n/ /" | grep -P "^(`sample $n $k | tr '\n' '|'`x) " | cut -d ' ' -f 2-
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment