I had a ddRAD project with two restriction enzymes not supported by Stacks - Hinfl and HpyCH4IV. This is how I added them to Stacks:
- Download the code, open src/renz.cc
- Check NEB for the restriction enzyme sequences: https://www.nebiolabs.com.au/products/r0155-hinfi#Product%20Information and https://www.nebiolabs.com.au/products/r0619-hpych4iv#Product%20Information
- Understand how Stacks encode restriction enzymes - Hinfl is G CUT ANTC, which in the comments of renz.cc is encoded as G/ANTC
- Choose the longer piece (in this case, ANTC) and translate that into all four possibilities AGTC AATC ATTC ACTC, and the reverse complement GACT GATT GAAT GAGT. For others like A/CGT it's easier, take the longer piece and its reverse-complement, so it's just CGT and ACG
- Add that to the code, and don't forget your semicolons (I think the first letter of the RE name has to be lower case??)
123 const char *xhoI[] = {"TCGAG", // C/TCGAG, XhoI 124 "CTCGA"}; 125 const char *hpyCH4IV[] = {"CGT", // A/CGT, HpyCH4IV 126 "ACG"}; 127 const char *hinfI[] = {"AATC", "ATTC", // G/ANTC, hinfI 128 "ACTC", "AGTC", 129 "GAGT", "GACT", 130 "GATT", "GAAT"}; 131
Don't forget the count of restriction enzyme sites and the length of each site in the two other data structures. If you have only a sequence and its reverse complement then the count is 1. AGT has a length of 3.
132 void 133 initialize_renz(map &renz, map &renz_cnt, map &renz_len) { 134 135 renz["hpyCH4IV" ] = hpyCH4IV; // A/CGT, hpyCH4IV 136 renz["hinfI"] = hinfI; // // G/ANTC, hinfI 137 renz["sbfI"] = sbfI; // CCTGCA/GG, SbfI 138 renz["pstI"] = pstI; // CTGCA/G, PstI 139 renz["notI"] = notI; // GC/GGCCGC, NotI .... 190 renz_cnt["hpyCH4IV" ] = 1; 191 renz_cnt["hinfI"] = 4; 192 renz_cnt["sbfI"] = 1; ... 245 renz_len["hpyCH4IV" ] = 3; 246 renz_len["hinfI"] = 4; 247 renz_len["sbfI"] = 6;
Now go into the Stacks base directory, run ./compile and ./make, fix any errors you introduced, if you now run process_radtags your new restriction enzymes should be listed:
... Currently supported enzymes include: 'aciI', 'ageI', 'aluI', 'apaLI', 'apeKI', 'apoI', 'aseI', 'bamHI', 'bbvCI', 'bfaI', 'bfuCI', 'bgIII', 'bsaHI', 'bspDI', 'bstYI', 'cac8I', 'claI', 'csp6I', 'ddeI', 'dpnII', 'eaeI', 'ecoRI', 'ecoRV', 'ecoT22I', 'haeIII', 'hinP1I', 'hindIII', 'hinfI', 'hpaII', 'hpyCH4IV', 'kpnI', 'mluCI', ...