Skip to content

Instantly share code, notes, and snippets.

@brentp
Last active August 29, 2015 14:10
Show Gist options
  • Save brentp/59a0811936bde4476bd4 to your computer and use it in GitHub Desktop.
Save brentp/59a0811936bde4476bd4 to your computer and use it in GitHub Desktop.
automatic running bedtools shuffle

usage:

shuf.sh "bedtools cmd" "metric" $genome

where metric must accept the output from bedtools cmd and return a single numeric value.

bash shuf.sh "bedtools intersect -a a.bed -b b.bed" "awk '{s+=\$3-\$2}END{print 1}'" hg19.genome

or

bash shuf.sh "bedtools intersect -a a.bed -b b.bed" "wc -l" contigs.genome

output

> observed cmd: bedtools intersect -a a.bed -b b.bed | awk '{s+=$3-$2}END{print s}'
> observed value: 425

> sim command: bedtools intersect -a a.bed -b <(bedtools shuffle -allowBeyondChromEnd -i b.bed -g contigs.genome) \
                             | awk '{s+=$3-$2}END{print s}'
> running 60 shufflings and comparing to observed
p-value: 0.131148
#!/bin/bash
set -eo pipefail
cmd="$1"
metric="$2"
G="$3"
N=60
echo "> observed cmd: $cmd | $metric"
M="$metric"
obs=$(eval "$cmd | $metric")
echo "> observed value:" $obs
# extract out the name of the -b file.
B=$(echo $cmd | perl -ne '/\s+-b(?:\s*)?([^\s]+)/;print $1')
SHUF="bedtools shuffle -allowBeyondChromEnd -i $B -g $G"
echo
simcmd=$(echo $cmd | perl -pe "s/-b\s*([^\s]+)/-b <($SHUF)/")
echo "> sim command: $simcmd \\
| $metric"
CMD="$simcmd | $metric"
echo "> running $N shufflings and comparing to observed"
cpus=$(grep -c ^processor /proc/cpuinfo)
p=$(seq $N | xargs -P $cpus -n 1 bash -c "($CMD)" | awk -vobs=$obs '{s += ($1 >= obs)}END{print (1 + s) / (1 + NR)}')
echo "p-value: $p"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment