Skip to content

Instantly share code, notes, and snippets.

@dacarlin
Last active August 11, 2016 05:04
Show Gist options
  • Save dacarlin/70b039be40655cf22db71b11c16dee67 to your computer and use it in GitHub Desktop.
Save dacarlin/70b039be40655cf22db71b11c16dee67 to your computer and use it in GitHub Desktop.

How to design mutagenic oligos for use in Kunkel mutagenesis

First, prepare your input FASTA file. The codons in the input file must be in frame and have a one to one correspondence with the amino acid sequence of your Rosetta model. Otherwise, the numbering will be off. Do not rely on the script to catch input errors. This simple script attempts a crude error check that randomly fails to detect errors 1/20th of the time does not attempt any checking of your input sequence, so please make sure it's correct ahead of time.

A good way to do this: translate your input file into amino acid sequence, then align with the sequence from your PDB structure (native) from Rosetta (you can save a FASTA file directly from PyMOL, just do save my_FASTA.fasta on the PyMOL command line to write a FASTA file called my_FASTA.fasta to the current working directory). Just make sure they are 100% identical before proceeding.

Here is an example input FASTA:

>BglB
AACACCTTTATCTTTCCGGCAACCTTTATGTGGGGCACCAGCACCAGCAGCTATCAGATTGAAGGTGGCACCGATGAAGG
TGGTCGTACCCCGAGCATTTGGGATACCTTTTGTCAGATTCCGGGTAAAGTTATTGGTGGTGATTGTGGTGATGTTGCCT
GTGATCATTTTCACCACTTTAAAGAAGATGTGCAGCTGATGAAACAGCTGGGTTTTCTGCATTATCGTTTTAGCGTTGCA
TGGCCTCGTATTATGCCTGCAGCAGGTATTATCAATGAAGAGGGTCTGCTGTTTTATGAGCATCTGCTGGATGAAATTGA
ACTGGCAGGTCTGATTCCGATGCTGACCCTGTATCATTGGGATCTGCCGCAGTGGATTGAAGATGAAGGCGGTTGGACCC
AGCGTGAAACCATTCAGCATTTCAAAACCTATGCCAGCGTTATCATGGATCGTTTTGGTGAACGTATTAATTGGTGGAAC
ACCATCAATGAACCGTATTGTGCAAGCATTCTGGGTTATGGCACCGGTGAACATGCACCGGGTCATGAAAATTGGCGTGA
AGCATTTACCGCAGCACATCATATTCTGATGTGTCATGGTATTGCAAGCAACCTGCATAAAGAAAAAGGTCTGACCGGTA
AAATTGGCATTACCCTGAATATGGAACATGTTGATGCAGCAAGCGAACGTCCGGAAGATGTTGCCGCAGCAATTCGTCGT
GATGGTTTTATCAATCGTTGGTTTGCAGAACCGCTGTTCAATGGTAAATATCCTGAAGATATGGTGGAATGGTATGGCAC
CTATCTGAATGGTCTGGATTTTGTTCAGCCTGGTGATATGGAACTGATTCAGCAGCCAGGTGATTTTCTGGGCATTAACT
ATTATACCCGTAGCATTATTCGCAGCACCAATGATGCAAGCCTGCTGCAAGTTGAACAGGTTCACATGGAAGAACCGGTT
ACCGATATGGGTTGGGAAATTCATCCGGAAAGCTTCTATAAACTGCTGACCCGCATTGAAAAAGATTTTAGCAAAGGTCT
GCCGATCCTGATTACCGAAAATGGTGCAGCAATGCGTGATGAACTGGTTAATGGTCAGATCGAAGATACCGGTCGTCATG
GTTATATTGAAGAACATCTGAAAGCCTGCCACCGCTTTATCGAAGAAGGTGGCCAGCTGAAAGGTTATTTTGTTTGGAGC
TTTCTGGATAACTTTGAATGGGCATGGGGTTATAGCAAACGTTTTGGTATTGTCCACATCAACTATGAAACCCAAGAACG
CACCCCGAAACAGAGCGCACTGTGGTTTAAACAAATGATGGCCAAAAATGGTTTCGGCAGCCTCGAGC

To design all possible single oligos for this sequence, you can use the following code in a Jupyter notebook. The code will run as written here if you save the above sequence as a file called bglb_model_coding.fa in the current directory.

from skbio import DNA

ecoli_favorite = { 
    'G':'GGC', 'A':'GCG', 'V':'GTG', 'F':'TTT', 'E':'GAA', 'D':'GAT', 'N':'AAC', 'C':'TGC', 'K':'AAA', 'L':'CTG',
    'H':'CAT', 'P':'CCG', 'Q':'CAG', 'W':'TGG', 'Y':'TAT', 'I':'ATT', 'M':'ATG', 'R':'CGT', 'T':'ACC', 'S':'AGC', 
}

dna = DNA.read( 'bglb_model_coding.fa' ) 
kmers = [ dna[i:i+33] for i in range( 0, len( dna ), 3 ) ]

my_oligos = []
for i, k in enumerate( kmers ):
    for aa, codon in ecoli_favorite.iteritems():
        my_str = str( k[:15] ) + codon + str( k[18:] )
        my_dna = DNA( my_str ) 
        my_oligo = my_dna.reverse_complement()
        my_name = str( k[15:18].translate() ) + str( i + 6 ) + aa
        if len( my_oligo ) == 33:
            my_oligos.append( '>{}\n{}\n'.format( my_name, my_oligo ) )

with open( 'my_oligos.fa', 'w' ) as fn:
    fn.write( ''.join( my_oligos ) ) 
    

which will write out a file called my_oligos.fa with all possible single oligos for your gene.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment