Skip to content

Instantly share code, notes, and snippets.

@lokeyCEU
lokeyCEU / RepMask2bed.txt
Last active April 12, 2019 13:22
How to take RepeatMasker output(fa.out.gz) and make a .bed file
# I needed to take the .fa.out.gz file, which is the repeat-masker file for hg19, and make an .bed for calling HERVs with RetroSeq
# First I made it so I could work with the .gz file by making it a .Z
mv hg19.fa.out.gz hg19.fa.out.gz.Z
# Once you know what needs to be in the .bed file awk out the header, then grab what columns you need(with regular expressions for tabs),
# and | head to make sure it's working right. If it did change | head to > yourfilename.bed
# Final Code:
awk 'NR >3 {print $5"\t"$6"\t"$7"\t"$10"\t"$1"\t"$9"\t"$6"\t"$7} ' <(zcat hg19.fa.out.gz.Z) | sed $'s/\tC\t/\t-\t/g' > repeatmasker.hg19.bed
# Notice the sed command, this is where I changed the C's in the +/- column to -'s.
#
# Now to grep out the elements we need.