-
-
Save nlhepler/fc106c8546525032297f to your computer and use it in GitHub Desktop.
// monomolecular | |
string mdl = "P6C4.NoCov"; // maybe we should just have a default model for each chemistry? e.g. just "P6C4"? | |
IntegratorConfig icfg; // ParameterTable::Default, minZScore=-5 | |
string tpl = ...; // do POA | |
vector<string> reads = ...; | |
vector<tuple<size_t, size_t>> extents = ...; // get extents from POA | |
SNR snr = { ..., ..., ..., ... }; | |
MonoMolecularIntegrator mmi(tpl, icfg, snr, mdl); | |
for (size_t i = 0; i < reads.size(); ++i) | |
{ | |
size_t ts, te; | |
tie(ts, te) = extents[i]; | |
auto error = mmi.AddRead(MappedRead(Read(std::to_string(i), reads[i], mdl), ts, te)); | |
// do something with error maybe? | |
} | |
PolishConfig pcfg; | |
Polish(mmi, pcfg); | |
return ConsensusAndQVs(mmi); | |
// multimolecular | |
IntegratorConfig icfg; | |
vector<tuple<string, SNR, string>> reads = ...; // seq, SNR, model | |
string tpl = ...; // Poa | |
vector<tuple<size_t, size_t>> extents = ...; // also from Poa? or by lifting coordinates like Quiver | |
MultiMolecularIntegrator mmi(tpl, icfg); | |
for (size_t i = 0; i < reads.size(); ++i) | |
{ | |
size_t ts, te; | |
tie(ts, te) = extents[i]; | |
auto status = mmi.AddRead(MappedRead(Read(std::to_string(i), get<0>(reads[i]), get<2>(reads[i])), ts, te), get<1>(reads[i])); | |
} | |
PolishConfig pcfg; | |
Polish(mmi, pcfg); | |
return ConsensusAndQVs(mmi); |
Thanks for writing this up. Much easier to understand now.
I like your comment, I think it would be good to have a common API---just grab the SNR from the read.
I confess that I am still a little puzzled as to what the essential difference between the mono/multi cases really is. My current understanding is that it's because the read covariates (SNR) are becoming considered part of the "template", so that in the CCS case you can have generare consensus from a single "template", whereas for multi we need to maintain separate "templates" for all subreads.
So is the "mono" case then just a degenerate form of "multi"? If so inclined, could we use "multi" in CCS, while perhaps performing duplicate work/allocation. Am I correct in assuming that "mono" is then mostly an optimization, avoiding duplicated data structures?
One more thought: presumably, in the future, we could save memory by making the "multi" be an aggregate over "mono" --- combine subreads by ZMW into mono's.
We could use a bucket of templates keyed on SNR to do things transparently at the Integrator level, and be done with this distinction, true.
Maybe we want SNR to be part of the Read no matter what? Then we can lift AddRead into the common API, and throw an exception when you've added a read with an invalid SNR to a MonoMolecularIntegrator. Thoughts?