This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# I needed to take the .fa.out.gz file, which is the repeat-masker file for hg19, and make an .bed for calling HERVs with RetroSeq | |
# First I made it so I could work with the .gz file by making it a .Z | |
mv hg19.fa.out.gz hg19.fa.out.gz.Z | |
# Once you know what needs to be in the .bed file awk out the header, then grab what columns you need(with regular expressions for tabs), | |
# and | head to make sure it's working right. If it did change | head to > yourfilename.bed | |
# Final Code: | |
awk 'NR >3 {print $5"\t"$6"\t"$7"\t"$10"\t"$1"\t"$9"\t"$6"\t"$7} ' <(zcat hg19.fa.out.gz.Z) | sed $'s/\tC\t/\t-\t/g' > repeatmasker.hg19.bed | |
# Notice the sed command, this is where I changed the C's in the +/- column to -'s. | |
# | |
# Now to grep out the elements we need. |