Skip to content

Instantly share code, notes, and snippets.

@arq5x
Last active August 29, 2015 14:01
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arq5x/b67196a46db5b63bee06 to your computer and use it in GitHub Desktop.
Save arq5x/b67196a46db5b63bee06 to your computer and use it in GitHub Desktop.
Merging adjacent BEDGRAPH intervals with coverage >= mincov
# assumes that the BEDGRAPH is the output of the bedtools genomecov tool using the -bga option.
$ cat foo.bedg
chr1 0 10 5
chr1 10 20 5
chr1 20 30 10
chr1 30 40 11
chr1 40 50 9
chr1 50 60 10
chr1 60 70 11
chr1 70 80 15
chr1 80 90 5
chr1 90 100 10
# 1. add an extra column with the MINCOV (e.g., 10) if the depth meets MINCOV
$ cat foo.bedg | awk -v mincov=10 '{ if ($4 >= mincov) {print $0"\t"mincov} else {print $0"\t"$4}}'
chr1 0 10 5 5
chr1 10 20 5 5
chr1 20 30 10 10
chr1 30 40 11 10
chr1 40 50 9 9
chr1 50 60 10 10
chr1 60 70 11 10
chr1 70 80 15 10
chr1 80 90 5 5
chr1 90 100 10 10
# 2. we can now use the extra column as a grouping criterion (assumes BEDGRAPH)
# 3. group on the surrogate column and report a new interval with the min start and max end in the "block"
# 4. Yank out the relevant columns.
# 5. Profit.
$ cat foo.bedg \
| awk -v mincov=10 '{ if ($4 >= mincov) {print $0"\t"mincov} else {print $0"\t"$4}}' \
| bedtools groupby -g 1,5 -c 1,2,3,4 -o first,first,last,first \
| cut -f 3-6
chr1 0 20 5
chr1 20 40 10
chr1 40 50 9
chr1 50 80 10
chr1 80 90 5
chr1 90 100 10
# You could also use mean as the new column.
$ cat foo.bedg \
| awk -v mincov=10 '{ if ($4 >= mincov) {print $0"\t"mincov} else {print $0"\t"$4}}' \
| bedtools groupby -g 1,5 -c 1,2,3,4 -o first,first,last,mean \
| cut -f 3-6
chr1 0 20 5
chr1 20 40 10.5
chr1 40 50 9
chr1 50 80 12
chr1 80 90 5
chr1 90 100 10
$ cat foo.bedg \
| awk -v mincov=10 '{ if ($4 >= mincov) {print $0"\t"mincov} else {print $0"\t"$4}}' \
| bedtools groupby -g 1,5 -c 1,2,3,4 -o first,first,last,max \
| cut -f 3-6
chr1 0 20 5
chr1 20 40 11
chr1 40 50 9
chr1 50 80 15
chr1 80 90 5
chr1 90 100 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment