Skip to content

Instantly share code, notes, and snippets.

@arraytools
Last active March 17, 2017 18:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arraytools/dd62bcca60cc36a1d1769d1a4a7d226b to your computer and use it in GitHub Desktop.
Save arraytools/dd62bcca60cc36a1d1769d1a4a7d226b to your computer and use it in GitHub Desktop.
$ perl reads_simulator.pl # Help
usage: reads_simulator.pl <num_reads> <name> [options]
<name> string is used in the names of the output files.
Use only alphanumeric, underscores and dashes in this string.
This program outputs a fasta file of reads and a .cig file and .bed file representing
the truth of where these reads map.
- bed file is in one-based coords and contains both endpoints of each span.
- cig file has a cigar string representation of the mapping coordinates, and a more human
readable representation of the coordinates. All coords are one-based.
Also output are: a file of indels, a file of substitutions, a file of splice forms, and a file of junctions
and gaps crossed by the simulated reads.
options:
-readlength n : Set readlength to n>0 bases (default n = 100).
-numgenes n : Choose n>=1 genes at random from a master pool of gene models (default n = 30000).
This only works with the four config files that have no stem (see below for more info).
-strandspecific : Generate strand specific data
-error x : Set the error rate that any given base is sequenced wrong to 0<=x<=1 (default x = 0.005).
-subfreq x : Set substitution rate to 0<=x<1 (default x = 0.001).
-indelfreq x : Set indel rate to 0<=x<1 (default x = 0.0005).
-nointron : Set this to have no signal coming from introns.
-tlen n : Set the length of the low quality tail to n bases (default n = 10).
-tpercent x : Set the percent of tails that are low quality to 0<=x<=1
(default x = 0).
-tqual x : Set quality of the low quality tail to 0<=x<=1 (default x = 0.8).
-nalt n : Set the number of novel splice forms per gene to n>1 (default n = 2).
-palt x : Set the percentage of signal coming from novel splice
forms to 0<=x<=1 (default x = 0.2).
-sn : Add sequence number to the first column of the bed file.
-configstem x : The stem x will be added to the four default filenames for the four
required files. Use this if you have made your own config files (see
below about config files and custom config files).
E.g. "simulator_config_geneinfo" becomes "simulator_config_geneinfo_x".
-cntstart n : Start the read counter at n (default n = 1).
-outdir x : x is a path to a directory to write to. Default is the current directory.
-mastercfgdir x : x is a path to a directory where the master config files are. Default is the
current directory.
-customcfgdir x : If you are using -configstem option, then x is a path to a directory where the
custom config files are. Default is the directory specified by -outdir, which.
itself defaults to the current directory.
-usesubs x : x is a file of substitutions in the format output by this program in case you want
to resuse them for another run with the same gene models.
-useindels x : x is a file of indels in the format output by this program in case you want
to resuse them for another run with the same gene models.
-usealts x : x is a file of transcripts in the format output by this program in case you want
to resuse them for another run with the same gene models. This must be used in
conjunction with -configstem for the same four files used the first time (those
will be the files ending in 'temp' if you used the master pool the first time).
This program depends on four (master config) files:
1) simulator_config_geneinfo
2) simulator_config_geneseq
3) simulator_config_intronseq
4) simulator_config_featurequantifications
With these files the program chooses -numgenes of them at random.
If you create custom config files with a suffix and use the -configstem mode
it then uses all genes in the file.
To create custom config files for a subset of genes in the master config, use the script:
- make_config_files_for_subset_of_gene_ids.pl
Run it with no parameters for the usage
To use the custom config files with this program put them in the same directory as the script, or
in the directory specified by -outdir (not -mastercfgdir which specifies the master config files)
and use the option -configstem
See http://www.cbil.upenn.edu/BEERS for more information.
$ ls -lh
total 1.4G
-rw-r--r-- 1 brb brb 44M Sep 16 2010 simulator_config_featurequantifications_refseq
-rw-r--r-- 1 brb brb 7.7M Sep 15 2010 simulator_config_geneinfo_refseq
-rw-r--r-- 1 brb brb 106M Sep 15 2010 simulator_config_geneseq_refseq
-rw-r--r-- 1 brb brb 1.3G Sep 15 2010 simulator_config_intronseq_refseq
$ more simulator_config_featurequantifications_refseq
basereads_total= 3896500899
--------------------------------------------------------------------
GENE.20704 -
Type Location Count Ave_Cnt Ave_Nrm Length
gene chr1_gl000191_random:36275-50281 11873 5.3797 1.3806 2207
exon 1 chr1_gl000191_random:50009-50281 2139 7.83516 2.0108 273
intron 1 chr1_gl000191_random:48652-50008 866 0.6381 0.1637 1357
exon 2 chr1_gl000191_random:48545-48651 796 7.43925 1.9092 107
intron 2 chr1_gl000191_random:44911-48544 6809 1.8736 0.4808 3634
exon 3 chr1_gl000191_random:44809-44910 439 4.30392 1.1045 102
intron 3 chr1_gl000191_random:41846-44808 3290 1.1103 0.2849 2963
exon 4 chr1_gl000191_random:41683-41845 1519 9.31901 2.3916 163
intron 4 chr1_gl000191_random:41460-41682 80 0.3587 0.092 223
exon 5 chr1_gl000191_random:41406-41459 503 9.31481 2.3905 54
intron 5 chr1_gl000191_random:37783-41405 18832 5.1979 1.3339 3623
exon 6 chr1_gl000191_random:36275-37782 6477 4.29509 1.1022 1508
--------------------------------------------------------------------
GENE.20822 -
Type Location Count Ave_Cnt Ave_Nrm Length
gene chr1_gl000191_random:36275-50281 11097 4.6803 1.2011 2371
exon 1 chr1_gl000191_random:50009-50281 2139 7.83516 2.0108 273
intron 1 chr1_gl000191_random:48652-50008 866 0.6381 0.1637 1357
exon 2 chr1_gl000191_random:48487-48651 952 5.76969 1.4807 165
intron 2 chr1_gl000191_random:48108-48486 28 0.0738 0.0189 379
exon 3 chr1_gl000191_random:47745-48107 574 1.58126 0.4058 363
intron 3 chr1_gl000191_random:44887-47744 6100 2.1343 0.5477 2858
exon 4 chr1_gl000191_random:44809-44886 390 5 1.2832 78
intron 4 chr1_gl000191_random:41846-44808 3290 1.1103 0.2849 2963
exon 5 chr1_gl000191_random:41683-41845 1519 9.31901 2.3916 163
intron 5 chr1_gl000191_random:41460-41682 80 0.3587 0.092 223
exon 6 chr1_gl000191_random:41406-41459 503 9.31481 2.3905 54
intron 6 chr1_gl000191_random:37550-41405 20289 5.2616 1.3503 3856
exon 7 chr1_gl000191_random:36275-37549 5020 3.93725 1.0104 1275
--------------------------------------------------------------------
...
$ head simulator_config_geneinfo_refseq
chr1 - 14362 29370 11 14362,14969,15795,16606,16857,17232,17605,17914,18267,24737,29320 14829,15038,15947,16765,17055,17368,17742,18061,18366,24891,29370 GENE.14211
chr1 - 34611 36081 3 34611,35276,35720 35174,35481,36081 GENE.16185
chr1 + 69090 70008 1 69090 70008 GENE.4391
chr1 + 323891 328580 4 323891,324287,324438,327035 324060,324345,326938,328580 GENE.1388
chr1 + 323891 328580 3 323891,324287,324438 324060,324345,328580 GENE.27751
chr1 + 367658 368595 1 367658 368595 GENE.31890
chr1 - 566188 566265 1 566188 566265 GENE.5863
chr1 - 621097 622034 1 621097 622034 GENE.27342
chr1 - 661139 665731 3 661139,665277,665562 665184,665335,665731 GENE.14244
chr1 - 700244 714068 7 700244,701708,703927,704876,708355,709550,713663 700627,701767,703993,705092,708487,709660,714068 GENE.30215
$ grep GENE.20704 simulator_config_geneinfo_refseq
chr1_gl000191_random - 36274 50281 6 36274,41405,41682,44808,48544,50008 37782,41459,41845,44910,48651,50281 GENE.20704
$ head -6 simulator_config_geneseq_refseq
>GENE.14211:chr1:14362-29370_-
CCTGCACAGCTAGAGATCCTTTATTAAAAGCACACTGTTGGTTTCTGCTCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGACAGAAGTCCCCGCCCCAGCTGTGTGGCCTCAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTCCCAGTCGTCCTCGTCCTCCTCTGCCTGTGGCTGCTGCGGTGGCGGCAGAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTCATGACCAGCTTGTTGAAGAGATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGCTCCTTCTGCTGCTGCTTCTCCAGCTTTCGCTCCTTCATGCTGCGCAGCTTGGCCTTGCCGATGCCCCCAGCTTGGCGGATGGACTCTAGCAGAGTGGCCAGCCACCGGAGGGGTCAACCACTTCCCTGGGAGCTCCCTGGACTGAAGGAGACGCGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGGTGGGAGTGGGGGTGCACTGGCCAGCACCTCAGGAGCTGGGGGTGGTGGTGGGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAGGGGCAGAGGGGGCAATGCCGGGGCCCAGGTCGGCAATGTACATGAGGTCGTTGGCAATGCCGGGCAGGTCAGGCAGGTAGGATGGAACATCAATCTCAGGCACCTGGCCCAGGTCTGGCACATAGAAGTAGTTCTCTGGGACCTGCTGTTCCAGCTGCTCTCTCTTGCTGATGGACAAGGGGGCATCAAACAGCTTCTCCTCTGTCTCTGCCCCCAGCATCACATGGGTCTTTGTTACAGCACCAGCCAGGGGGTCCAGGAAGACATACTTCTTCTACAGGTTCTCGGTGGTGTTGAAGAGCAGCAAGGAGCTGACAGAGCTGATGTTGCTGGGAAGACCCCCAAGTCCCTCTTCTGCATCGTCCTCGGGCTCCGGCTTGGTGCTCACGCACACAGGAAAGTCCTTCAGCTTCTCCTGCAGGGCCCGCTCGTCCAGGGGGCGGTGCTTGCTCTGGATCCTGTGGCGGGGGCGTCTCTGCAGGCCAGGGTCCTGGGCGCCCGTGAAGATGGAGCCATATTCCTGCAGGCGCCCTGGAGCAGGGTACTTGGCACTGGAGAACACCTTGATGGCCTTCTTGCTGCCCTTGATCTTCTCAATCTTGGCCTGGGCCAAGGAGACCTTCTCTCCAATGGCCTGCACCTGGCTCCGGCTCTGCTCTACCTGCTGAAGATGTCTCCAGAGACCTTCTGCAGGTACTGCAGGGCATCCGCCATCTGCTGGACGGCCTCCTCTCGCCGCAGGTCTGGCTGGATGAAGGGCACGGCATAGGTCTGACCTGCCAGGGAGTGCTGCATCCTCACAGGAGTCATGGTGCCTGCGAGCCGCCCTCCCGGAAGCTCCCGCCGCCGCTTCCGCTCTGCCGGA
>GENE.16185:chr1:34611-36081_-
AAAGGCTTAAACACAATGGAAGTTTATTTCTCACTAAGGGAACATCCAAATCCATGATACTTTAAGTCAGGGACCCAGGTTCCTCCCATCTATGGTTCTGCCATCACTAATCTGGGTCTTCCACAATTGCCGTGCTCCTTGGAGGTGGGAAGAGCAGGCGGAGGACACGTGGGAGGTTTTAGGGACAAGCCTGGAGGCAGCATGCGTCACTCCCATGCAGAGTCCATTGGCCAATGCTGGCTCCGATGGCCACATCTCACTGCAGGGGCAGCTGGGAAATACAGTCTGGCTGTCTACCCAGGAGGAAGAGCAGCCAGTTTCTGCTGCTGATGATCAGGAGGTGGAGAAAATGTTCAGTCAGGCAGGGAGTGGGAATAGACAAGACCACAAGCAGCTTGGTGCCTCTGAAAGGGAGAGGGGTGGAGGGGAGACTAGAGAGGTGGGTAGGAATACTGGATTCCACTGACCACGTGCTGGATGTCACGCTTAGCCCTCCTGCTCTGTGCCGGGTTAGGCACCTGGTGTTTTACGTACATAATCTCAATTCTGTGAGGGCATCCGACAAGAATTTGGTGGGGAAAATATTACCATCTTTCCCTTTTGTGATTGGAGAAAAATGAGGCTTTGAAGGGTTTAAGAACTTGCCCAAGGTCGGCCAGGTGCAGTGGCTCATGTCTATAATCCCAACACTTTGGGAGGCTGAGGTGGGAGGATCGCTTGAGGCCAGGAGTTCAAGACCAGCCTGAGCAACATAGTGAGACTTTGTCTCTATAGTCAGCAGCATCGGGGGTCAGGAAAGACTTCACGAAGCCATAAATGCATCCTTCTCGGGGCAGCACCTGGCTCTCCCAGGTGAGAGAGGACTCCATTTTCACAGGCAGGCGTGGGAGCTTCAGCACCCATCTCTGGGCCCAGAATGACCCACTGGAGACCTTACAGCTCTCCTGTCACCCCCAATTCCTGCCCCCTCTGCAGCCTTGGAGGAGAATGGAGCTGAAGGGCCTGCCCTCTGTAGGGTGAGAAAGGGAGGCTAAAGCCTGGTGCCCACTGCCCTGGCTGCTCCGCATTGCAGGAGCTGCGCCCTTCCTTTCCTGGCACAGGGTCCACAGCCCCGAAACCCCGTTGTGTG
>GENE.4391:chr1:69090-70008_+
ATGGTGACTGAATTCATTTTTCTGGGTCTCTCTGATTCTCAGGAACTCCAGACCTTCCTATTTATGTTGTTTTTTGTATTCTATGGAGGAATCGTGTTTGGAAACCTTCTTATTGTCATAACAGTGGTATCTGACTCCCACCTTCACTCTCCCATGTACTTCCTGCTAGCCAACCTCTCACTCATTGATCTGTCTCTGTCTTCAGTCACAGCCCCCAAGATGATTACTGACTTTTTCAGCCAGCGCAAAGTCATCTCTTTCAAGGGCTGCCTTGTTCAGATATTTCTCCTTCACTTCTTTGGTGGGAGTGAGATGGTGATCCTCATAGCCATGGGCTTTGACAGATATATAGCAATATGCAAGCCCCTACACTACACTACAATTATGTGTGGCAACGCATGTGTCGGCATTATGGCTGTCACATGGGGAATTGGCTTTCTCCATTCGGTGAGCCAGTTGGCGTTTGCCGTGCACTTACTCTTCTGTGGTCCCAATGAGGTCGATAGTTTTTATTGTGACCTTCCTAGGGTAATCAAACTTGCCTGTACAGATACCTACAGGCTAGATATTATGGTCATTGCTAACAGTGGTGTGCTCACTGTGTGTTCTTTTGTTCTTCTAATCATCTCATACACTATCATCCTAATGACCATCCAGCATCGCCCTTTAGATAAGTCGTCCAAAGCTCTGTCCACTTTGACTGCTCACATTACAGTAGTTCTTTTGTTCTTTGGACCATGTGTCTTTATTTATGCCTGGCCATTCCCCATCAAGTCATTAGATAAATTCCTTGCTGTATTTTATTCTGTGATCACCCCTCTCTTGAACCCAATTATATACACACTGAGGAACAAAGACATGAAGACGGCAATAAGACAGCTGAGAAAATGGGATGCACATTCTAGTGTAAAGTTTTAG
$ grep GENE.20704 simulator_config_geneseq_refseq
>GENE.20704:chr1_gl000191_random:36274-50281_-
$ wc -l simulator_config_geneseq_refseq
72934 simulator_config_geneseq_refseq
$ head -6 simulator_config_intronseq_refseq
>chr1:43870222-43871953
GTATTGCATCATCTCTCCAAGTTTGTACCCTCAGACCAAATTTCTATTAGTCCTCTGACCAAGTCCTTATCCTGTCTCTGCTGTTTGTCCCCAAAGTCCCGAGCTCTGCTGGCTTCTTGAACCTGTTTTCTAGTCATCCTCATGAGTCTCTCTCTCCTTGAGAAGAACCAGTTCCTCTGGACTTAAAGGGCTTTCCTATAGACTTCGGGTCAGTTGGTGTTGATTGGACACCTGCCTTTTTCACTGCTCCTGTAAATCTCTTTGATTCTGACCACTGGATGTCTGTTTTCTCCATTCCCCTCTTCTCCTTTAGTTCTGTTGGATGGTTTGCTGATGCGCTGGCTCTGGCTAGTACCCTCAGGAGATGGTCTTTTGGAATAGTTTTGTGATTTTGAAAGACAGCACATAACCCAGAAATGTAATTGGTTTCAAGGATAGAGGAGGGTCATTTTTTGTTGTTGTCTTTTTTTGAGACGGAGTTTCGCTCTTTTTGCCCAGGCTGGAGTGCAATGGCACAATCTTGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCGATTCTCCTGCCTCAGCCTCCCAAGTAACGGGGATGATAGGCACTCACCACCAGGCCCAGCTTATTTATTTATTTATTTATTTTTAGTAGAGACGGGGTTTTGCCATGTTGGCCAGGCTGGTCTTGAACTCCCGACCTCAGGTGATCTGCCAGCCTCGGCCTCCCAAAGTGCTAGGATTACAGGCGTGAGCCACCGTGCCCAGTGTGGGTCATTTTTTTTTTGAGGGCCATGTGAACTGTTGGTTATAATGCTGACCACACAGATAATTACAGGGCTTATCCAGCACTCAAGGAGGCTATCACAAGACACTCCTTTTGGAGAATGGCCAAGTCTGTTACTGGAATATCTGTTTATGACTTAGATAAGCATTGGCCCAGTTTCCCCTCATCTGCTTTTAACTACCCTTAACTCATTGGTAAAGGCTTTCCCTGAAACTGTCCGTGATGGCTGCATTGTCACCTTAGCTATCAAACTGGATTTGAGAGATAAGATATTGGGAACATGATCTAAGGGTCTGAAACATACAGCAACAAAATTGGTGCATACCTGGATCCCAGGGCAGGGATTGCCAGCCCTCTGCCCTAGTTTAAAGTGCTTTCCTAGTATAGCCGTTAGGACTATCAGTTGTGGGCTGCATCCCCAGATGGGGGCCAGGTTTTGCAGGCACTGATAGCCAAGGCAGGGAGAGCATGGATGTTAGGTAATCACTCAAAACTTGGTTCAGGTTGGCAAACACTTAATGAAGCCTGAGCTATACCAGGCCTGAGCAGGTCCTGAAGACACAGATGAATCAGACCTTGTCTTTCTCCTGATGTGAAGAGTTGAGATTTTAGTGGGGGAGATAAATATTAAAGGGAGGTAATTGTAGTTGAATGTGATAAGGGCACCGATAAGATGACTCAGGTGCTATGGGAGCACCAAGAGGGAGCACCCAACCAAGCATGGGGCCACAATAGAGAAGTCTTTCAGACAAAGCAGCATTTGTTTTTCATGTTTTTGAGACAGAGTCTTGCTCTGTTGCCCGGGCTAGAGTGCAGTGGTGCAACCTTGGCTCACTGCACCCTCCACCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTTTCAAGTAGCTGGGATTACAGGTGCCCACTACCACACCCGGCTAATTTTTGTTTTTTTTGTTTTTTTTGGTAG
>chr1:153603588-153604509
CTGAATGAACAAAAGCATGGGAGAAAGAAGAATGTGGAGCATGTCAGTAGTTTCATTCAACTGGCATGAAGGATGTAGAAAACTAGGCTGCTAAGGCTAACCAGGGCCCCAAAGTAAAGAAGCCTCAGGATACGGTAGAAAGTGCACCGGTTTTGAGGTTTTATTGGACTGGGTTCAAATTCCAGCCCTGCTACTTTCTAGCTGTGGGACTTTGGGCATATTTATTTAACCACCGATTACCAATTTGTTTCTTCTCTAGAATGGAAATAATGAAACATACCTCACATTATTTGTATTAGAATTAAATTAGACAGTGCTTGTAAAGCTCCTAGTGTAGTGCTTGCTTTATTATAGGCACTGAACATACGGTAACTGGTGTCATTATTATTCATTCTGCCAGGTGAAAGAGCTTATGCTGTAGGCAACAGAAGCCCTCAAGACCTTTGAGGAGGCCTAGAAGAGTCCCATAATTCAATGCAGTTCCTCTCGCTCATCTTTGCCTCCTGCTCCTCAACCACCCCCTTGCCTCTGACTCAGTGCTGTACCCTTCCCTATACACCTCCCTCTTCCTCTCCTCCCACCACAGGCCCAGAAGGATGTGGATGCTGTGGACAAGGTGATGAAGGAGCTAGACGAGAATGGAGACGGGGAGGTGGACTTCCAGGAGTATGTGGTGCTTGTGGCTGCTCTCACAGTGGCCTGTAACAATTTCTTCTGGGAGAACAGTTGAGCAGACAGCCACATTGGGCAGCGCCCTTCCTCTCCACCCTCCCAGACCTGCCTCTTCCCCCTGCTTCCACCTCACCCCACTTATCCCTCTCCATAACCCCACCCTTGCCCACCCCACCCCCACCCCCACCAAGGGCGCAAGAGTAGCGGTCCAAGCCTGCAACTCATCTTTCATTAAAGGCTTCTCTCTCAC
>chr1:38166194-38167425
GTAAGTGGACACTGAGGTTGGAGGCTGGGGGTTATCACAAGACAAGCAAATACAAATGCATAAAATACACTGTTGTACACCAGAGCTCACAGTTTAGTAGAGATGATCCTACTTTACATGGCAATGACAAAAGAGATAATTGACTGGTGTGCAAAGTGTGTAGAACACAGAAGAGGAAGCTAGCAAGCCTCCCCTACCCCACTAGGACATAGACAGGCCTTTCAAGGTGAGTGAGAGTCTACTAGCTAGAGAAGAAAGGGAAGAGGAACAGTATTCAGGATAGATGGGGGAGGCGGAAAACTGAAGATGAGGAAACCAGTTAGAAGCCTATGAATTCTTGAACTGACATGCTGACACTATAGAATGAAGGGATGGTTTTTAGATAGGTTTTAGAAAGAAGTGGCATGGTTGGTGGAGAAAAGGAGGAATCAGAGACAACTGCATGGTTTCTAGCTAAGGCAAATGCTATCCAGTGAGGTGTGGGACCCAGGTGGCAGAGGGTGCTGGACTGGGGGTGGTGGAGCACATGCCAATTGCATGACATGTTGACCATTCATTACTTTTCCAGGACATCTAGGCTAGACAGAAATTTCAGCCTGGAGCTCAGGGAGAAGAGGGAGATTTGAGAGATACACTGCCTCCCAGATCTTTAGGTGTTTGTCCCAGTGTGGGAGTGGAGATGGTTGTGCAGGGAGAGCAGGAAGGGAGGGAGTCAGGGTGGACACTAGAGGGAGGGTAGAGAGAGGAGCTCCTAAGGCAGACTGCCAAGGAAGTTAGAGAAGCAGGAAAGACAGATCCATCACGGAGGACAAGGGAGGAGAGAATATATGGAAGATAAGGAGTGATTCTTGCCAGGGACCTCCCAGGCCCCAGGAAGGGAGCACTGCAGGGTTTGCCTTTGATTGGGCACCTGGGACGCCCTGAGTGACCTTCACAGCAGCCCTTTATGTAAAGTAGTGGGGTACAAGCCAGGCTGCAATCAGTTCAAAAGAGATTTACCTTTAATCTTGGTCAAAATCCTCTATTTTGAAAACCCAGCCAGTGGAAGTCCTTATTCAATTAGTTATGTTCTGAACTTTTTAAGTCATTGAATTCTGAGATACACCTTATTATATTTAGTATTAACTATCCAAAGCTTGTTCTTAATGCTTTAAAAAAAAAAAAAAAACCCTCCTTACCTTCCTCTTTGCTGGGTTTTTTGTAACCTTAGTCTCATTTGATTGTGACTGCAG
$ wc -l simulator_config_intronseq_refseq
416376 simulator_config_intronseq_refseq
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment