Skip to content

Instantly share code, notes, and snippets.

@philippbayer
Last active February 22, 2020 13:07
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save philippbayer/163d6d61b8e86c3f21c15ecee1e48328 to your computer and use it in GitHub Desktop.
Save philippbayer/163d6d61b8e86c3f21c15ecee1e48328 to your computer and use it in GitHub Desktop.
Adding new restriction enzymes to Stacks

I had a ddRAD project with two restriction enzymes not supported by Stacks - Hinfl and HpyCH4IV. This is how I added them to Stacks:

  1. Download the code, open src/renz.cc
  2. Check NEB for the restriction enzyme sequences: https://www.nebiolabs.com.au/products/r0155-hinfi#Product%20Information and https://www.nebiolabs.com.au/products/r0619-hpych4iv#Product%20Information
  3. Understand how Stacks encode restriction enzymes - Hinfl is G CUT ANTC, which in the comments of renz.cc is encoded as G/ANTC
  4. Choose the longer piece (in this case, ANTC) and translate that into all four possibilities AGTC AATC ATTC ACTC, and the reverse complement GACT GATT GAAT GAGT. For others like A/CGT it's easier, take the longer piece and its reverse-complement, so it's just CGT and ACG
  5. Add that to the code, and don't forget your semicolons (I think the first letter of the RE name has to be lower case??)
123 const char *xhoI[]    = {"TCGAG",             // C/TCGAG, XhoI
124                          "CTCGA"};
125 const char *hpyCH4IV[] = {"CGT",               // A/CGT, HpyCH4IV
126                         "ACG"};
127 const char *hinfI[]   = {"AATC", "ATTC",      // G/ANTC, hinfI
128                          "ACTC", "AGTC",
129                         "GAGT", "GACT",
130                         "GATT", "GAAT"};
131

Don't forget the count of restriction enzyme sites and the length of each site in the two other data structures. If you have only a sequence and its reverse complement then the count is 1. AGT has a length of 3.

132 void
133 initialize_renz(map &renz, map &renz_cnt, map &renz_len) {
134
135     renz["hpyCH4IV" ] = hpyCH4IV; //  A/CGT, hpyCH4IV
136     renz["hinfI"] = hinfI; // // G/ANTC, hinfI
137     renz["sbfI"]    = sbfI;    // CCTGCA/GG, SbfI
138     renz["pstI"]    = pstI;    // CTGCA/G, PstI
139     renz["notI"]    = notI;    // GC/GGCCGC, NotI
....
190     renz_cnt["hpyCH4IV" ] = 1;
191     renz_cnt["hinfI"] = 4;
192     renz_cnt["sbfI"]    = 1;
...
245     renz_len["hpyCH4IV" ] = 3;
246     renz_len["hinfI"] = 4;
247     renz_len["sbfI"]    = 6;

Now go into the Stacks base directory, run ./compile and ./make, fix any errors you introduced, if you now run process_radtags your new restriction enzymes should be listed:

...
    Currently supported enzymes include:
      'aciI', 'ageI', 'aluI', 'apaLI', 'apeKI', 'apoI', 'aseI', 'bamHI',
      'bbvCI', 'bfaI', 'bfuCI', 'bgIII', 'bsaHI', 'bspDI', 'bstYI', 'cac8I',
      'claI', 'csp6I', 'ddeI', 'dpnII', 'eaeI', 'ecoRI', 'ecoRV', 'ecoT22I',
      'haeIII', 'hinP1I', 'hindIII', 'hinfI', 'hpaII', 'hpyCH4IV', 'kpnI', 'mluCI',
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment