Skip to content

Instantly share code, notes, and snippets.

@ctb
Last active August 29, 2015 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ctb/6eaef7971ea429ab348d to your computer and use it in GitHub Desktop.
Save ctb/6eaef7971ea429ab348d to your computer and use it in GitHub Desktop.
*.bam
*.sam
*.ht
*.bai
*.ebwt
*.fai
*.corr
*.keepalign
*.keep
*.keepvar
mix.fa
reads.fa
reads2.fa
*~

Note: genome.fa and genome-var.fa differ at positions 100, 600 and 1100.

You'll need khmer branch feature/collect-variants.

To run:

bash make.sh  # produces reads.fa, reads2.fa, and mix.fa
bash do.sh    # produces {reads,reads2,mix}.sorted.bam

bash readscorr.sh  # correct reads.fa against self
bash varcorr.sh    # correct mix.fa against self, retaining variants (!!)

bash vardown.sh    # downsample mix.fa against self, adaptively oversampling variant containing regions (!!)
bowtie-build genome.fa genome > /dev/null
samtools faidx genome.fa
bowtie genome -f reads.fa -S reads.sam
samtools import genome.fa.fai reads.sam reads.bam
samtools sort reads.bam reads.sorted
samtools index reads.sorted.bam
###
bowtie genome -f reads2.fa -S reads2.sam
samtools import genome.fa.fai reads2.sam reads2.bam
samtools sort reads2.bam reads2.sorted
samtools index reads2.sorted.bam
###
bowtie genome -f mix.fa -S mix.sam
samtools import genome.fa.fai mix.sam mix.bam
samtools sort mix.bam mix.sorted
samtools index mix.sorted.bam
###
echo samtools tview reads.sorted.bam genome.fa
echo samtools tview reads2.sorted.bam genome.fa
echo samtools tview mix.sorted.bam genome.fa
>genome
TCCGTGCGCGGGTGGCTCAGCCCCAGTTGGTAAGCGGTACAGCCGAGACGTCCTCACAACATATCCACGCTTAGCATAAGCAGCCGGTTCCAACATATTTgACAGTAGGAAAGTGCCGTGCCATGCATCCGACATACTATTGGTCCGAGTACAGGGTGTTACACAATGTACTCGATCAGGAACGGACTGGACTCTTGGTTTGGTTGAGTGGATTATGTGTGTTATTGAAATTTCGCAACGATTCTTTTGAGCACATCAATAGCTGGCCTTGAGCCCGTCGATCAAGTGTCAATGCTGCACAATTATACGGTTCACACACACCATTTGTCGTTGTGACTATCCTGAACGGCGTGAAAAGCTTTGTGACAAGGATCAATCGCCTCTGAGCGAGCCAAAATCCAGCGCGTGAAGACACATTTAAGTACCGAGCGATGTGTATCTGGGCGCGAATCGCACAGATGCCCTCTTCCGGACATCTGGTGAAACGCGCAACCTTCTCGTGCCGCCCAGCACCGGGTGACTAGGTTGAGCCATGATTAACCTGCAATGAAGGTCATTCACACGCAGCGTCATTTAATGGATTGGTGCACACTTAACTGGGaGCCGCGCTGGTGCTGATCCATGAAGTTCATCTGGACTTGTACGTGCGACAGCTCCTTCCATTTCCGCCTTGCCATACAGACCACCTAAGACCGCAGACCCTCCTCCTTACCACATGCGATGCGTGGGAACCGGTGTCAAAGACGGGTGCCGCTACACAGGAAGGCACCCAGGGAAAGTCGTTTGCCGGAAGAGAGTGGAGCTCCTACGTAAACGGGGAAACCACTTGTTTGGATTCCCCCTTGCCGATTCGGCCCTATCAGGATGTATTTAACTTAGGAGAAACCGAACAACTGCCACCGCTTATTGCCCCGGCAGGCGGTAGTTTCCACGATCTAACAATCGAAGCAATTCGGACAGGCTTAAGCTACAAAGCTCGGATTTTGTAAGTGCTCTATCCTTTGTAGGAAGTGAAAGATGACGTTGCGGCCGTCGCTGTTGGAGGAACCGCAGCACCATGGCGCCTGTGCGAGCTGGAGATCCTCTCATAGCGTCAGAGCACtGGATGCTGTATATTAAGCACACAATAGCCCGGGGACCGGCCCCAACGTGAAATGCCTGGCCTGCCGTTCTTTATAGTGCTCGTGATAGTGTTATAAAGGAACTAACATCAAGTTATGTAAGGACTTTTACAATAGCGTGGTCCGTCAAGTCGTCCACGTGTGTAAATTCATTGGTACCTTTTGCCGAAAAATTTGAAAGCTAAGCACATTCTGCTTACTCACAGGGTAAGTTCCTGAAGTATTAATGTAATGTGGAAAGACAGGCATATGAACACTATTGGGCTTTGTAGACATTCCTCATCCATGCTGTATCAGTAATGTACAATTCGCCCCTTTCGTAAAGGAGAGCCGTGCTAACGTTATATTCGGTCTTACCACGGGCTCGATAGTTTGCCCCTCCGTGCGCGGGTGGCTCAGCCCCAGTTGGTAAGCGGTACAGCCGAGACGTCCTCACAACATATCCACGCTTAGCATAAGCAGCCGGTTCCAACATATTTTACAGTAGGAAAGTGCCGTGCCATGCATCCGACATACTATTGGTCCGAGTACAGGGTGTTACACAATGTACTCGATCAGGAACGGACTGGACTCTTGGTTTGGTTGAGTGGATTATGTGTGTTATTGAAATTTCGCAACGATTCTTTTGAGCACATCAATAGCTGGCCTTGAGCCCGTCGATCAAGTGTCAATGCTGCACAATTATACGGTTCACACACACCATTTGTCGTTGTGACTATCCTGAACGGCGTGAAAAGCTTTGTGACAAGGATCAATCGCCTCTGAGCGAGCCAAAATCCAGCGCGTGAAGACACATTTAAGTACCGAGCGATGTGTATCTGGGCGCGAATCGCACAGATGCCCTCTTCCGGACATCTGGTGAAACGCGCAACCTTCTCGTTTATAGGAACTCCCCGACAAACACACCCTGTTTGCGCAGTGGGATTACGTAAATTGGAGACGACGGCCGCTACCATTGTCTTGTTCGTTGGAGCATAGCATTACGCCATAGCAGTGAGCTTAATTATCGGGCACTAAGGCTGTCGAAACAGAGACGGCGTACGGACGCGGTCTTACCGATGCAAGAGCGCTCCTCATCATGAGCGGTACTAACATCTAAGGTTGGGCGACCAGCTAAAATCGCCTCAATCCTTAGGAGCCAAACGATCAACCTTTAGAGGTCCGGTTAGCAATTTAGGCGGACACCGGATCGTCAACAGCTAGGAGATTTTGCAATACACACCATCCGCGAGACACGACAAACCTAGTGGTTCTGCAGCATCTCTAAGTCGCCTCCGTCGCCAGGCTAGAGTCGACGTTACGTACGTCAACTGTAGCAAAAAGTGCTTGGTTCCCAAATTCATTATCTTTGATCACGGGATACCAGAGGATACGAGGGAAAACTCAGTTCCGGTAAAAAACTTTTCGATGTTGCCCCACATCGTGTGTCTCACGCAGCCGGAGTGCCAAGGAAATCAGGGTAATATTCGGAGGACTGACAGTGCGGGGGATTATTTGGTTCCACACTCCCGGTGGGCCGATATGAAGCGTGCCGTTCCCTTGCGTCTCGTTTCGTCTCCCGGTCCCGTTCTTGTGCCTGTTCTGAAACTAATCGAAGTCCGTGTCTTAACCAGTGTAGATCTTGTGGTCTACGAGTGCTCTGTTCAAGGTGATAAACATCTCGCTCTAAAACAATAATACACATCTCGCCAATCGAGATCTGCCGTGGAGTGTAGCCGAAAGAAGATATTTCGAGTGGTGCCAGATCACCCAGACTTAGTTGTCGGTTTTCCCCCGCGAGGACGGGATCATGTATGGGGATTCTTTTACATCATCAAAACCCGCCTCGGATGTGTCCGCTCTGTTGGATTGGAGCTCCTTACCTAAGATTGGGAAAACATCCGTGCGCGGGTGGCTCAGCCCCAGTTGGTAAGCGGTACAGCCGAGACGTCCTCACAACATATCCACGCTTAGCATAAGCAGCCGGTTCCAACATATTTTACAGTAGGAAAGTGCCGTGCCATGCATCCGACATACTATTGGTCCGAGTACAGGGTGTTACACAATGTACTCGATCAGGAACGGACTGGACTCTTGGTTTGGTTGAGTGGATTATGTGTGTTATTGAAATTTCGCAACGATTCTTTTGAGCACATCAATAGCTGGCCTTGAGCCCGTCGATCAAGTGTCAATGCTGCACAATTATACGGTTCACACACACCATTTGTCGTTGTGACTATCCTGAACGGCGTGAAAAGCTTTGTGACAAGGATCAATCGCCTCTGAGCGAGCCAAAATCCAGCGCGTGAAGACACATTTAAGTACCGAGCGATGTGTATCTGGGCGCGAATCGCACAGATGCCCTCTTCCGGACATCTGGTGAAACGCGCAACCTTCTCGCGGTAAGGCAATCCTGATCATGGTGGGTTAACGGAATGGTGAGAGCCTACATCGTGCACCTTGAGAATAATGGTCGGCTTCCCATTAGGTGCATGAGCCTAGGCCGCGGGATGTGGCGGTGTCGAGCACGACCATAAATTACCAGATTGATGCTTCGGCGCGCAATGGAGAAATGGCACACTCAGGAGTCCTGCTTTGACGCGTGACTGGTTATGGTCAAGCCACCATGGCTTTTATGTGGGCTTGCTTGACGGAGACTATCGCATAGAGCCGCTGAAAGTCCATCATAAGACGCATCAACACTGTGGGGCATCTAAACCCCATGCCATTTGCTTTAATTGACGAGTTAGATCTTAGAATTATCACTTTCGCGCTATCCTGTGAGGCAGCCTACACTGAGGACGCACCTAGCTAGGCACCGCTATGGGCCATGATTTTGGTGCCTGCGCTGCTTCCATGGCTGGCTCAGAAGATTCCCCTTGCATTGCCGCTAACAGACCACGGAAGGGACCGTGGGGCTTCCCTAATCTGGAGTCCTCGAGCAGAACCTACACTTAATAGGTAGGGATCAACGTTGGTCCGCACTAAACTAATGGACCGTGCCAACAGGAGTTCTTATTCGCTCATGAAGGTCTAGACTAGGCCGAGAAATGTGCTAAAACCAGGACACCCGTACTTTGAAATTCATTATGATCCATCATTGAGTCTGCCTGAAGTATGAAGAGAACCACATTTCAACTTACAGCCCAAGATCTCGACGGATTTGGATGCATGATTCTATCTAAGATACATCGTTTCGCCCTCGCTATTCACAATTGTACGTCATGTAGTATATCGACTCACCCGCGCGCTTCCGGATAACACCTGCACTGATGTGAGAAACGGTCTGCATTGGCTCATATGTTATCGTTAAACTTTGCACGTAAAAAGGCTGTCGACCTAGACTCTGTTAGCAAAACCACGTCGCCAACCTTCCGGCGTGGAACCCGGAGTGGTCGTGTCCGTGCGCGGGTGGCTCAGCCCCAGTTGGTAAGCGGTACAGCCGAGACGTCCTCACAACATATCCACGCTTAGCATAAGCAGCCGGTTCCAACATATTTTACAGTAGGAAAGTGCCGTGCCATGCATCCGACATACTATTGGTCCGAGTACAGGGTGTTACACAATGTACTCGATCAGGAACGGACTGGACTCTTGGTTTGGTTGAGTGGATTATGTGTGTTATTGAAATTTCGCAACGATTCTTTTGAGCACATCAATAGCTGGCCTTGAGCCCGTCGATCAAGTGTCAATGCTGCACAATTATACGGTTCACACACACCATTTGTCGTTGTGACTATCCTGAACGGCGTGAAAAGCTTTGTGACAAGGATCAATCGCCTCTGAGCGAGCCAAAATCCAGCGCGTGAAGACACATTTAAGTACCGAGCGATGTGTATCTGGGCGCGAATCGCACAGATGCCCTCTTCCGGACATCTGGTGAAACGCGCAACCTTCTCGTCAGTCAGTGTAGCCCTGGCGCGTACACGATGCCGCTGTTGAGCTGCGGGCATGAGTACTCGCGCTGAAACTCCCAATCATCCTTACGCCTACCAGCCTACAATAAGAGAGTTCTATAGGAACGGCCGCTCTGGCTAATGTCTATTTCTAAGTTACCTTATCAGGTGAGCCATGCCACCCGTAAACCCTTGTCGTGTGCGTTTTATTCTTAATAAACTCACCGTGACCGTGAAGTTCTATTCGGAGTCCTCCCGAGGGCATACGCTGCGAGCGGTGTGTCACTCCACACGGCGCACGTGAGCGCTCTATATTTCCCAGGACCGTACGCTAAGTGGACGTTCAGATAGGATGAGTGAGACGCAGGTGGGATCACTTAGAATTAGGTGCCACTGTTAAGGCCACGTGGCGTTGGTGTGAATGAGTCGCATACTATACATTCTTGTAGGATTGTAATTGATCCTGCCCCAAAATGGTTGCATGGAAGCGAATTAACGGAAAACTAGTCTAGACCTCTTGCGGGTCTCTCTTGACGTGGCAATGGAACCAGACAACGCATCAGAGAAATCTATACATCCATCCCACGTTTCTTGGTATAGTCTGATGTTATTTGCTGGACCACCTTCCCAATTAGGGAGAGTATAGCTACGTGTATAATAAAAGCTTTTCCTTGCGAACGGATTGGGACGTACACATGAGTTAAACATGGCGAAGCTCCTTCCACCCGCCCAGGGAGGGACTGCCCTTAGTCGTTCAGGCGATTAGGGTCCTAATACGTTTAACAACGCTCAGCGCTGGTAAATCGAGGGAAGCAGTGAATCCCCCCGGACGCCCCCGGCGGCTAGCATGAGGCGCGATTCCGCTGTGTCCAACTTTCCGGAAAATGAAAAGAAGTCGGCACTGCCGAAGACAACTCTTTCCGAAGGTATGATAAGGGCCTAGCACTGATACTCTAGTTAAGAACATAGGAACTATGCACCTGTGCATGTACGTTTCATCCTTATCCGTGCGCGGGTGGCTCAGCCCCAGTTGGTAAGCGGTACAGCCGAGACGTCCTCACAACATATCCACGCTTAGCATAAGCAGCCGGTTCCAACATATTTTACAGTAGGAAAGTGCCGTGCCATGCATCCGACATACTATTGGTCCGAGTACAGGGTGTTACACAATGTACTCGATCAGGAACGGACTGGACTCTTGGTTTGGTTGAGTGGATTATGTGTGTTATTGAAATTTCGCAACGATTCTTTTGAGCACATCAATAGCTGGCCTTGAGCCCGTCGATCAAGTGTCAATGCTGCACAATTATACGGTTCACACACACCATTTGTCGTTGTGACTATCCTGAACGGCGTGAAAAGCTTTGTGACAAGGATCAATCGCCTCTGAGCGAGCCAAAATCCAGCGCGTGAAGACACATTTAAGTACCGAGCGATGTGTATCTGGGCGCGAATCGCACAGATGCCCTCTTCCGGACATCTGGTGAAACGCGCAACCTTCTCGGCCCCAACGAGCCTCGTGCGTTGATAATACCAATATGATCAAGGCCAGTCTAGGTTGATTACTTCGGAACACAGCGGACCCAGTGACAGAAAGACCCATGTCGAATGCGGGTATTACGATAGAAGTATGCAGTGACTGGTGCCTAATGGAGGAACCCGAAGTGTTCATGAATCCGATCAATCTGACCCGATTCTTGAAGGTACCCATTATCCTACTCGGATTGTACAAGTGGGCTTGCCCAGAATTAAATTAATGTATTTCTTGATCATTAAGAGTGGCTACGGAGAGCCCTTGACTGGAATCTTAGCCCAGCCATGCCTTTGACCCGTAGTCGATACGTAAATCCGTTTAACCTCCTCTTGGAAGTAACCTTTGCAGGCAGGCGACCGCACTTCTTAAACAGGAACCATGCCCTTGCCCGAAGGTCGCGCTAGTTCTGAATCCGTGGCCGCGTTATACGTTTCGAAGATTGTTGAACCCTTAACACGATCTCATCGGCGACTCTCCACGCAATGAGAGTAAGTAGTGGCCCTAGTGGTCTCCCCGGCCGGAGTGCCAAGTGCGTCGAAGTTTGCCTCTTGTCAACATGGGACGGGCAGACCCTAACGCCGAACTCTTCGTGACCTCGGTATTGACGGCCTAAAAATGCTCGTAGCGGGCTGGGCCAATATTGGAGCATGGATTACCTATGCCAGGTGAGGGAAAAACTTATCATCCAAACTGTTACCGAACACCGGAAAGGTTGTCGTAAACGGTGGCTGAACTTGATTGAGCTCGGTGAATCTATCGATTGCTTCTATGGACTGGTAGTGTCCAGCGACTTATCGTCATGGTGTCCCGATGATTTCCACAAGAGTCGATCGGCCTCAATGTGGTGCCTCCTGAATACGAGATTAAATAAGATCCAGAAGCCCTTTGCGGTTAACGCCCGATAACCAGAGCGGCCTTCGATCTCTCCCGAACGCAGAGAGTTGTCTTACCCGAGATTTTTTCAGATGGC
>genome
TCCGTGCGCGGGTGGCTCAGCCCCAGTTGGTAAGCGGTACAGCCGAGACGTCCTCACAACATATCCACGCTTAGCATAAGCAGCCGGTTCCAACATATTTTACAGTAGGAAAGTGCCGTGCCATGCATCCGACATACTATTGGTCCGAGTACAGGGTGTTACACAATGTACTCGATCAGGAACGGACTGGACTCTTGGTTTGGTTGAGTGGATTATGTGTGTTATTGAAATTTCGCAACGATTCTTTTGAGCACATCAATAGCTGGCCTTGAGCCCGTCGATCAAGTGTCAATGCTGCACAATTATACGGTTCACACACACCATTTGTCGTTGTGACTATCCTGAACGGCGTGAAAAGCTTTGTGACAAGGATCAATCGCCTCTGAGCGAGCCAAAATCCAGCGCGTGAAGACACATTTAAGTACCGAGCGATGTGTATCTGGGCGCGAATCGCACAGATGCCCTCTTCCGGACATCTGGTGAAACGCGCAACCTTCTCGTGCCGCCCAGCACCGGGTGACTAGGTTGAGCCATGATTAACCTGCAATGAAGGTCATTCACACGCAGCGTCATTTAATGGATTGGTGCACACTTAACTGGGTGCCGCGCTGGTGCTGATCCATGAAGTTCATCTGGACTTGTACGTGCGACAGCTCCTTCCATTTCCGCCTTGCCATACAGACCACCTAAGACCGCAGACCCTCCTCCTTACCACATGCGATGCGTGGGAACCGGTGTCAAAGACGGGTGCCGCTACACAGGAAGGCACCCAGGGAAAGTCGTTTGCCGGAAGAGAGTGGAGCTCCTACGTAAACGGGGAAACCACTTGTTTGGATTCCCCCTTGCCGATTCGGCCCTATCAGGATGTATTTAACTTAGGAGAAACCGAACAACTGCCACCGCTTATTGCCCCGGCAGGCGGTAGTTTCCACGATCTAACAATCGAAGCAATTCGGACAGGCTTAAGCTACAAAGCTCGGATTTTGTAAGTGCTCTATCCTTTGTAGGAAGTGAAAGATGACGTTGCGGCCGTCGCTGTTGGAGGAACCGCAGCACCATGGCGCCTGTGCGAGCTGGAGATCCTCTCATAGCGTCAGAGCACGGGATGCTGTATATTAAGCACACAATAGCCCGGGGACCGGCCCCAACGTGAAATGCCTGGCCTGCCGTTCTTTATAGTGCTCGTGATAGTGTTATAAAGGAACTAACATCAAGTTATGTAAGGACTTTTACAATAGCGTGGTCCGTCAAGTCGTCCACGTGTGTAAATTCATTGGTACCTTTTGCCGAAAAATTTGAAAGCTAAGCACATTCTGCTTACTCACAGGGTAAGTTCCTGAAGTATTAATGTAATGTGGAAAGACAGGCATATGAACACTATTGGGCTTTGTAGACATTCCTCATCCATGCTGTATCAGTAATGTACAATTCGCCCCTTTCGTAAAGGAGAGCCGTGCTAACGTTATATTCGGTCTTACCACGGGCTCGATAGTTTGCCCCTCCGTGCGCGGGTGGCTCAGCCCCAGTTGGTAAGCGGTACAGCCGAGACGTCCTCACAACATATCCACGCTTAGCATAAGCAGCCGGTTCCAACATATTTTACAGTAGGAAAGTGCCGTGCCATGCATCCGACATACTATTGGTCCGAGTACAGGGTGTTACACAATGTACTCGATCAGGAACGGACTGGACTCTTGGTTTGGTTGAGTGGATTATGTGTGTTATTGAAATTTCGCAACGATTCTTTTGAGCACATCAATAGCTGGCCTTGAGCCCGTCGATCAAGTGTCAATGCTGCACAATTATACGGTTCACACACACCATTTGTCGTTGTGACTATCCTGAACGGCGTGAAAAGCTTTGTGACAAGGATCAATCGCCTCTGAGCGAGCCAAAATCCAGCGCGTGAAGACACATTTAAGTACCGAGCGATGTGTATCTGGGCGCGAATCGCACAGATGCCCTCTTCCGGACATCTGGTGAAACGCGCAACCTTCTCGTTTATAGGAACTCCCCGACAAACACACCCTGTTTGCGCAGTGGGATTACGTAAATTGGAGACGACGGCCGCTACCATTGTCTTGTTCGTTGGAGCATAGCATTACGCCATAGCAGTGAGCTTAATTATCGGGCACTAAGGCTGTCGAAACAGAGACGGCGTACGGACGCGGTCTTACCGATGCAAGAGCGCTCCTCATCATGAGCGGTACTAACATCTAAGGTTGGGCGACCAGCTAAAATCGCCTCAATCCTTAGGAGCCAAACGATCAACCTTTAGAGGTCCGGTTAGCAATTTAGGCGGACACCGGATCGTCAACAGCTAGGAGATTTTGCAATACACACCATCCGCGAGACACGACAAACCTAGTGGTTCTGCAGCATCTCTAAGTCGCCTCCGTCGCCAGGCTAGAGTCGACGTTACGTACGTCAACTGTAGCAAAAAGTGCTTGGTTCCCAAATTCATTATCTTTGATCACGGGATACCAGAGGATACGAGGGAAAACTCAGTTCCGGTAAAAAACTTTTCGATGTTGCCCCACATCGTGTGTCTCACGCAGCCGGAGTGCCAAGGAAATCAGGGTAATATTCGGAGGACTGACAGTGCGGGGGATTATTTGGTTCCACACTCCCGGTGGGCCGATATGAAGCGTGCCGTTCCCTTGCGTCTCGTTTCGTCTCCCGGTCCCGTTCTTGTGCCTGTTCTGAAACTAATCGAAGTCCGTGTCTTAACCAGTGTAGATCTTGTGGTCTACGAGTGCTCTGTTCAAGGTGATAAACATCTCGCTCTAAAACAATAATACACATCTCGCCAATCGAGATCTGCCGTGGAGTGTAGCCGAAAGAAGATATTTCGAGTGGTGCCAGATCACCCAGACTTAGTTGTCGGTTTTCCCCCGCGAGGACGGGATCATGTATGGGGATTCTTTTACATCATCAAAACCCGCCTCGGATGTGTCCGCTCTGTTGGATTGGAGCTCCTTACCTAAGATTGGGAAAACATCCGTGCGCGGGTGGCTCAGCCCCAGTTGGTAAGCGGTACAGCCGAGACGTCCTCACAACATATCCACGCTTAGCATAAGCAGCCGGTTCCAACATATTTTACAGTAGGAAAGTGCCGTGCCATGCATCCGACATACTATTGGTCCGAGTACAGGGTGTTACACAATGTACTCGATCAGGAACGGACTGGACTCTTGGTTTGGTTGAGTGGATTATGTGTGTTATTGAAATTTCGCAACGATTCTTTTGAGCACATCAATAGCTGGCCTTGAGCCCGTCGATCAAGTGTCAATGCTGCACAATTATACGGTTCACACACACCATTTGTCGTTGTGACTATCCTGAACGGCGTGAAAAGCTTTGTGACAAGGATCAATCGCCTCTGAGCGAGCCAAAATCCAGCGCGTGAAGACACATTTAAGTACCGAGCGATGTGTATCTGGGCGCGAATCGCACAGATGCCCTCTTCCGGACATCTGGTGAAACGCGCAACCTTCTCGCGGTAAGGCAATCCTGATCATGGTGGGTTAACGGAATGGTGAGAGCCTACATCGTGCACCTTGAGAATAATGGTCGGCTTCCCATTAGGTGCATGAGCCTAGGCCGCGGGATGTGGCGGTGTCGAGCACGACCATAAATTACCAGATTGATGCTTCGGCGCGCAATGGAGAAATGGCACACTCAGGAGTCCTGCTTTGACGCGTGACTGGTTATGGTCAAGCCACCATGGCTTTTATGTGGGCTTGCTTGACGGAGACTATCGCATAGAGCCGCTGAAAGTCCATCATAAGACGCATCAACACTGTGGGGCATCTAAACCCCATGCCATTTGCTTTAATTGACGAGTTAGATCTTAGAATTATCACTTTCGCGCTATCCTGTGAGGCAGCCTACACTGAGGACGCACCTAGCTAGGCACCGCTATGGGCCATGATTTTGGTGCCTGCGCTGCTTCCATGGCTGGCTCAGAAGATTCCCCTTGCATTGCCGCTAACAGACCACGGAAGGGACCGTGGGGCTTCCCTAATCTGGAGTCCTCGAGCAGAACCTACACTTAATAGGTAGGGATCAACGTTGGTCCGCACTAAACTAATGGACCGTGCCAACAGGAGTTCTTATTCGCTCATGAAGGTCTAGACTAGGCCGAGAAATGTGCTAAAACCAGGACACCCGTACTTTGAAATTCATTATGATCCATCATTGAGTCTGCCTGAAGTATGAAGAGAACCACATTTCAACTTACAGCCCAAGATCTCGACGGATTTGGATGCATGATTCTATCTAAGATACATCGTTTCGCCCTCGCTATTCACAATTGTACGTCATGTAGTATATCGACTCACCCGCGCGCTTCCGGATAACACCTGCACTGATGTGAGAAACGGTCTGCATTGGCTCATATGTTATCGTTAAACTTTGCACGTAAAAAGGCTGTCGACCTAGACTCTGTTAGCAAAACCACGTCGCCAACCTTCCGGCGTGGAACCCGGAGTGGTCGTGTCCGTGCGCGGGTGGCTCAGCCCCAGTTGGTAAGCGGTACAGCCGAGACGTCCTCACAACATATCCACGCTTAGCATAAGCAGCCGGTTCCAACATATTTTACAGTAGGAAAGTGCCGTGCCATGCATCCGACATACTATTGGTCCGAGTACAGGGTGTTACACAATGTACTCGATCAGGAACGGACTGGACTCTTGGTTTGGTTGAGTGGATTATGTGTGTTATTGAAATTTCGCAACGATTCTTTTGAGCACATCAATAGCTGGCCTTGAGCCCGTCGATCAAGTGTCAATGCTGCACAATTATACGGTTCACACACACCATTTGTCGTTGTGACTATCCTGAACGGCGTGAAAAGCTTTGTGACAAGGATCAATCGCCTCTGAGCGAGCCAAAATCCAGCGCGTGAAGACACATTTAAGTACCGAGCGATGTGTATCTGGGCGCGAATCGCACAGATGCCCTCTTCCGGACATCTGGTGAAACGCGCAACCTTCTCGTCAGTCAGTGTAGCCCTGGCGCGTACACGATGCCGCTGTTGAGCTGCGGGCATGAGTACTCGCGCTGAAACTCCCAATCATCCTTACGCCTACCAGCCTACAATAAGAGAGTTCTATAGGAACGGCCGCTCTGGCTAATGTCTATTTCTAAGTTACCTTATCAGGTGAGCCATGCCACCCGTAAACCCTTGTCGTGTGCGTTTTATTCTTAATAAACTCACCGTGACCGTGAAGTTCTATTCGGAGTCCTCCCGAGGGCATACGCTGCGAGCGGTGTGTCACTCCACACGGCGCACGTGAGCGCTCTATATTTCCCAGGACCGTACGCTAAGTGGACGTTCAGATAGGATGAGTGAGACGCAGGTGGGATCACTTAGAATTAGGTGCCACTGTTAAGGCCACGTGGCGTTGGTGTGAATGAGTCGCATACTATACATTCTTGTAGGATTGTAATTGATCCTGCCCCAAAATGGTTGCATGGAAGCGAATTAACGGAAAACTAGTCTAGACCTCTTGCGGGTCTCTCTTGACGTGGCAATGGAACCAGACAACGCATCAGAGAAATCTATACATCCATCCCACGTTTCTTGGTATAGTCTGATGTTATTTGCTGGACCACCTTCCCAATTAGGGAGAGTATAGCTACGTGTATAATAAAAGCTTTTCCTTGCGAACGGATTGGGACGTACACATGAGTTAAACATGGCGAAGCTCCTTCCACCCGCCCAGGGAGGGACTGCCCTTAGTCGTTCAGGCGATTAGGGTCCTAATACGTTTAACAACGCTCAGCGCTGGTAAATCGAGGGAAGCAGTGAATCCCCCCGGACGCCCCCGGCGGCTAGCATGAGGCGCGATTCCGCTGTGTCCAACTTTCCGGAAAATGAAAAGAAGTCGGCACTGCCGAAGACAACTCTTTCCGAAGGTATGATAAGGGCCTAGCACTGATACTCTAGTTAAGAACATAGGAACTATGCACCTGTGCATGTACGTTTCATCCTTATCCGTGCGCGGGTGGCTCAGCCCCAGTTGGTAAGCGGTACAGCCGAGACGTCCTCACAACATATCCACGCTTAGCATAAGCAGCCGGTTCCAACATATTTTACAGTAGGAAAGTGCCGTGCCATGCATCCGACATACTATTGGTCCGAGTACAGGGTGTTACACAATGTACTCGATCAGGAACGGACTGGACTCTTGGTTTGGTTGAGTGGATTATGTGTGTTATTGAAATTTCGCAACGATTCTTTTGAGCACATCAATAGCTGGCCTTGAGCCCGTCGATCAAGTGTCAATGCTGCACAATTATACGGTTCACACACACCATTTGTCGTTGTGACTATCCTGAACGGCGTGAAAAGCTTTGTGACAAGGATCAATCGCCTCTGAGCGAGCCAAAATCCAGCGCGTGAAGACACATTTAAGTACCGAGCGATGTGTATCTGGGCGCGAATCGCACAGATGCCCTCTTCCGGACATCTGGTGAAACGCGCAACCTTCTCGGCCCCAACGAGCCTCGTGCGTTGATAATACCAATATGATCAAGGCCAGTCTAGGTTGATTACTTCGGAACACAGCGGACCCAGTGACAGAAAGACCCATGTCGAATGCGGGTATTACGATAGAAGTATGCAGTGACTGGTGCCTAATGGAGGAACCCGAAGTGTTCATGAATCCGATCAATCTGACCCGATTCTTGAAGGTACCCATTATCCTACTCGGATTGTACAAGTGGGCTTGCCCAGAATTAAATTAATGTATTTCTTGATCATTAAGAGTGGCTACGGAGAGCCCTTGACTGGAATCTTAGCCCAGCCATGCCTTTGACCCGTAGTCGATACGTAAATCCGTTTAACCTCCTCTTGGAAGTAACCTTTGCAGGCAGGCGACCGCACTTCTTAAACAGGAACCATGCCCTTGCCCGAAGGTCGCGCTAGTTCTGAATCCGTGGCCGCGTTATACGTTTCGAAGATTGTTGAACCCTTAACACGATCTCATCGGCGACTCTCCACGCAATGAGAGTAAGTAGTGGCCCTAGTGGTCTCCCCGGCCGGAGTGCCAAGTGCGTCGAAGTTTGCCTCTTGTCAACATGGGACGGGCAGACCCTAACGCCGAACTCTTCGTGACCTCGGTATTGACGGCCTAAAAATGCTCGTAGCGGGCTGGGCCAATATTGGAGCATGGATTACCTATGCCAGGTGAGGGAAAAACTTATCATCCAAACTGTTACCGAACACCGGAAAGGTTGTCGTAAACGGTGGCTGAACTTGATTGAGCTCGGTGAATCTATCGATTGCTTCTATGGACTGGTAGTGTCCAGCGACTTATCGTCATGGTGTCCCGATGATTTCCACAAGAGTCGATCGGCCTCAATGTGGTGCCTCCTGAATACGAGATTAAATAAGATCCAGAAGCCCTTTGCGGTTAACGCCCGATAACCAGAGCGGCCTTCGATCTCTCCCGAACGCAGAGAGTTGTCTTACCCGAGATTTTTTCAGATGGC
python ~/dev/nullgraph/make-reads.py -C 150 genome.fa > reads.fa
python ~/dev/nullgraph/make-reads.py -C 150 genome-var.fa > reads2.fa
~/dev/khmer/scripts/sample-reads-randomly.py -N 11250 -R 1 -o mix.fa reads.fa reads2.fa
#### reads again, corrected against readsvar- this looks like best error corrector, because it avoids perpetuating graph alignment errors by using raw reads
python ../sandbox/collect-variants.py -Z 20 reads.fa -k 20 -s readsvar.ht
python ../sandbox/read_aligner.py readsvar.ht reads.fa > reads.fa.corr 2> /dev/null
bowtie genome -f reads.fa.corr -S readscorr.sam
samtools import genome.fa.fai readscorr.sam readscorr.bam
samtools sort readscorr.bam readscorr.sorted
samtools index readscorr.sorted.bam
echo samtools tview readscorr.sorted.bam genome.fa
#### reads again, corrected against readsvar- this looks like best error corrector, because it avoids perpetuating graph alignment errors by using raw reads
python ../sandbox/collect-variants.py -Z 20 mix.fa -k 20 -s mixvar.ht
python ../sandbox/read_aligner.py mixvar.ht mix.fa > mix.fa.corr 2> /dev/null
bowtie genome -f mix.fa.corr -S mixcorr.sam
samtools import genome.fa.fai mixcorr.sam mixcorr.bam
samtools sort mixcorr.bam mixcorr.sorted
samtools index mixcorr.sorted.bam
echo samtools tview mixcorr.sorted.bam genome.fa
# downsampling reads while retaining variant signal
python ../sandbox/collect-variants.py -Z 20 mix.fa -k 20 -s mixvar.ht
bowtie genome -f mix.fa.keepvar -S keepvar.sam
samtools import genome.fa.fai keepvar.sam keepvar.bam
samtools sort keepvar.bam keepvar.sorted
samtools index keepvar.sorted.bam
X=$(samtools view keepvar.sorted.bam genome:800-800 | wc -l)
Y=$(samtools view keepvar.sorted.bam genome:600-600 | wc -l)
Z=$(samtools view keepvar.sorted.bam genome:1100-1100 | wc -l)
echo '---'
echo homozygous position 800 has $X reads
echo variant position 600 has $Y reads
echo variant position 1100 has $Z reads
echo "(average coverage is ~30)"
echo '---'
echo samtools tview keepvar.sorted.bam genome.fa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment