Last active
September 3, 2015 17:19
-
-
Save nlhepler/fc106c8546525032297f to your computer and use it in GitHub Desktop.
mono/multi-molecular integrator usage
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// monomolecular | |
string mdl = "P6C4.NoCov"; // maybe we should just have a default model for each chemistry? e.g. just "P6C4"? | |
IntegratorConfig icfg; // ParameterTable::Default, minZScore=-5 | |
string tpl = ...; // do POA | |
vector<string> reads = ...; | |
vector<tuple<size_t, size_t>> extents = ...; // get extents from POA | |
SNR snr = { ..., ..., ..., ... }; | |
MonoMolecularIntegrator mmi(tpl, icfg, snr, mdl); | |
for (size_t i = 0; i < reads.size(); ++i) | |
{ | |
size_t ts, te; | |
tie(ts, te) = extents[i]; | |
auto error = mmi.AddRead(MappedRead(Read(std::to_string(i), reads[i], mdl), ts, te)); | |
// do something with error maybe? | |
} | |
PolishConfig pcfg; | |
Polish(mmi, pcfg); | |
return ConsensusAndQVs(mmi); | |
// multimolecular | |
IntegratorConfig icfg; | |
vector<tuple<string, SNR, string>> reads = ...; // seq, SNR, model | |
string tpl = ...; // Poa | |
vector<tuple<size_t, size_t>> extents = ...; // also from Poa? or by lifting coordinates like Quiver | |
MultiMolecularIntegrator mmi(tpl, icfg); | |
for (size_t i = 0; i < reads.size(); ++i) | |
{ | |
size_t ts, te; | |
tie(ts, te) = extents[i]; | |
auto status = mmi.AddRead(MappedRead(Read(std::to_string(i), get<0>(reads[i]), get<2>(reads[i])), ts, te), get<1>(reads[i])); | |
} | |
PolishConfig pcfg; | |
Polish(mmi, pcfg); | |
return ConsensusAndQVs(mmi); |
One more thought: presumably, in the future, we could save memory by making the "multi" be an aggregate over "mono" --- combine subreads by ZMW into mono's.
We could use a bucket of templates keyed on SNR to do things transparently at the Integrator level, and be done with this distinction, true.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for writing this up. Much easier to understand now.
I like your comment, I think it would be good to have a common API---just grab the SNR from the read.
I confess that I am still a little puzzled as to what the essential difference between the mono/multi cases really is. My current understanding is that it's because the read covariates (SNR) are becoming considered part of the "template", so that in the CCS case you can have generare consensus from a single "template", whereas for multi we need to maintain separate "templates" for all subreads.
So is the "mono" case then just a degenerate form of "multi"? If so inclined, could we use "multi" in CCS, while perhaps performing duplicate work/allocation. Am I correct in assuming that "mono" is then mostly an optimization, avoiding duplicated data structures?