Last active
August 29, 2015 14:01
-
-
Save arq5x/b67196a46db5b63bee06 to your computer and use it in GitHub Desktop.
Merging adjacent BEDGRAPH intervals with coverage >= mincov
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# assumes that the BEDGRAPH is the output of the bedtools genomecov tool using the -bga option. | |
$ cat foo.bedg | |
chr1 0 10 5 | |
chr1 10 20 5 | |
chr1 20 30 10 | |
chr1 30 40 11 | |
chr1 40 50 9 | |
chr1 50 60 10 | |
chr1 60 70 11 | |
chr1 70 80 15 | |
chr1 80 90 5 | |
chr1 90 100 10 | |
# 1. add an extra column with the MINCOV (e.g., 10) if the depth meets MINCOV | |
$ cat foo.bedg | awk -v mincov=10 '{ if ($4 >= mincov) {print $0"\t"mincov} else {print $0"\t"$4}}' | |
chr1 0 10 5 5 | |
chr1 10 20 5 5 | |
chr1 20 30 10 10 | |
chr1 30 40 11 10 | |
chr1 40 50 9 9 | |
chr1 50 60 10 10 | |
chr1 60 70 11 10 | |
chr1 70 80 15 10 | |
chr1 80 90 5 5 | |
chr1 90 100 10 10 | |
# 2. we can now use the extra column as a grouping criterion (assumes BEDGRAPH) | |
# 3. group on the surrogate column and report a new interval with the min start and max end in the "block" | |
# 4. Yank out the relevant columns. | |
# 5. Profit. | |
$ cat foo.bedg \ | |
| awk -v mincov=10 '{ if ($4 >= mincov) {print $0"\t"mincov} else {print $0"\t"$4}}' \ | |
| bedtools groupby -g 1,5 -c 1,2,3,4 -o first,first,last,first \ | |
| cut -f 3-6 | |
chr1 0 20 5 | |
chr1 20 40 10 | |
chr1 40 50 9 | |
chr1 50 80 10 | |
chr1 80 90 5 | |
chr1 90 100 10 | |
# You could also use mean as the new column. | |
$ cat foo.bedg \ | |
| awk -v mincov=10 '{ if ($4 >= mincov) {print $0"\t"mincov} else {print $0"\t"$4}}' \ | |
| bedtools groupby -g 1,5 -c 1,2,3,4 -o first,first,last,mean \ | |
| cut -f 3-6 | |
chr1 0 20 5 | |
chr1 20 40 10.5 | |
chr1 40 50 9 | |
chr1 50 80 12 | |
chr1 80 90 5 | |
chr1 90 100 10 | |
$ cat foo.bedg \ | |
| awk -v mincov=10 '{ if ($4 >= mincov) {print $0"\t"mincov} else {print $0"\t"$4}}' \ | |
| bedtools groupby -g 1,5 -c 1,2,3,4 -o first,first,last,max \ | |
| cut -f 3-6 | |
chr1 0 20 5 | |
chr1 20 40 11 | |
chr1 40 50 9 | |
chr1 50 80 15 | |
chr1 80 90 5 | |
chr1 90 100 10 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment