Skip to content

Instantly share code, notes, and snippets.

@danielecook
Last active March 7, 2017 08:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danielecook/b5000d45ec2988da9217 to your computer and use it in GitHub Desktop.
Save danielecook/b5000d45ec2988da9217 to your computer and use it in GitHub Desktop.
Generate Low Complexity Region (LCR) bedfile of masked regions from UCSC repeatmasker data and its complement for use with bcftools
#!/bin/bash
wget 'http://hgdownload.soe.ucsc.edu/goldenPath/ce10/database/rmsk.txt.gz' -O LCR_rmsk.txt.gz
gunzip -kfc LCR_rmsk.txt.gz | grep 'Low_complexity' | cut -f 6,7,8 > LCR_ce10_rmsk.bed
rm LCR_rmsk.txt.gz
# Generate the set of regions complementary (e.g. NOT low complexity)
# Download c. elegans chromosome information
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from ce10.chromInfo" > ce10.genome
bedtools complement -i LCR_ce10_rmsk.bed -g ce10.genome | sort -k 1,1 -k2,2n > LCR_complement_ce10.bed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment