Skip to content

Instantly share code, notes, and snippets.

@inodb
Last active June 3, 2016 17:49
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save inodb/7c0715d0987927987d44fa6963688117 to your computer and use it in GitHub Desktop.
Generate deletions of arbitrary length in hg19 using ensembl rest api

Generate reads with deletions of arbitrary length using ensembl rest api

for del_size in 10 30 50 70 90 150 300 600
do
  chr=17
  start_pos=4126000
  read_length=100
  ens_url="http://grch37.rest.ensembl.org/sequence/region/human/"
  qual=$(python -c "print 'A'*${read_length}")
  (
    curl "${ens_url}${chr}:${start_pos}:$((${start_pos}+${read_length}/2-1)):1" -H 'Content-type:text/x-fasta'
    curl "${ens_url}${chr}:$((${start_pos}+${read_length}/2+${del_size})):$((${start_pos}+${read_length}+${del_size}-1)):1" -H 'Content-type:text/x-fasta'
  ) | paste - - - - | \
    awk -v del_size=${del_size} -v qual=${qual} -v FS='\t' \
    '{printf "@del_size=%s-%s-%s\n%s%s\n+\n%s\n", del_size, $1, $3, $2, $4, qual}'
done

Example of a generated read:

@del_size=10->chromosome:GRCh37:17:4126000:4126049:1->chromosome:GRCh37:17:4126055:4126104:1
GTTTTATATTAGGGTAGGGTTTCTTGTGCAGTAATATTTCTCGTAGCAATTAGCTGTTTTGCACCTTCATAGTATGAAAAGGGTTGAACTGGATGACAGC
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment