Skip to content

Instantly share code, notes, and snippets.

@nlhepler
Last active September 3, 2015 17:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nlhepler/fc106c8546525032297f to your computer and use it in GitHub Desktop.
Save nlhepler/fc106c8546525032297f to your computer and use it in GitHub Desktop.
mono/multi-molecular integrator usage
// monomolecular
string mdl = "P6C4.NoCov"; // maybe we should just have a default model for each chemistry? e.g. just "P6C4"?
IntegratorConfig icfg; // ParameterTable::Default, minZScore=-5
string tpl = ...; // do POA
vector<string> reads = ...;
vector<tuple<size_t, size_t>> extents = ...; // get extents from POA
SNR snr = { ..., ..., ..., ... };
MonoMolecularIntegrator mmi(tpl, icfg, snr, mdl);
for (size_t i = 0; i < reads.size(); ++i)
{
size_t ts, te;
tie(ts, te) = extents[i];
auto error = mmi.AddRead(MappedRead(Read(std::to_string(i), reads[i], mdl), ts, te));
// do something with error maybe?
}
PolishConfig pcfg;
Polish(mmi, pcfg);
return ConsensusAndQVs(mmi);
// multimolecular
IntegratorConfig icfg;
vector<tuple<string, SNR, string>> reads = ...; // seq, SNR, model
string tpl = ...; // Poa
vector<tuple<size_t, size_t>> extents = ...; // also from Poa? or by lifting coordinates like Quiver
MultiMolecularIntegrator mmi(tpl, icfg);
for (size_t i = 0; i < reads.size(); ++i)
{
size_t ts, te;
tie(ts, te) = extents[i];
auto status = mmi.AddRead(MappedRead(Read(std::to_string(i), get<0>(reads[i]), get<2>(reads[i])), ts, te), get<1>(reads[i]));
}
PolishConfig pcfg;
Polish(mmi, pcfg);
return ConsensusAndQVs(mmi);
@nlhepler
Copy link
Author

Maybe we want SNR to be part of the Read no matter what? Then we can lift AddRead into the common API, and throw an exception when you've added a read with an invalid SNR to a MonoMolecularIntegrator. Thoughts?

@dalexander
Copy link

Thanks for writing this up. Much easier to understand now.

I like your comment, I think it would be good to have a common API---just grab the SNR from the read.

I confess that I am still a little puzzled as to what the essential difference between the mono/multi cases really is. My current understanding is that it's because the read covariates (SNR) are becoming considered part of the "template", so that in the CCS case you can have generare consensus from a single "template", whereas for multi we need to maintain separate "templates" for all subreads.

So is the "mono" case then just a degenerate form of "multi"? If so inclined, could we use "multi" in CCS, while perhaps performing duplicate work/allocation. Am I correct in assuming that "mono" is then mostly an optimization, avoiding duplicated data structures?

@dalexander
Copy link

One more thought: presumably, in the future, we could save memory by making the "multi" be an aggregate over "mono" --- combine subreads by ZMW into mono's.

@nlhepler
Copy link
Author

nlhepler commented Sep 3, 2015

We could use a bucket of templates keyed on SNR to do things transparently at the Integrator level, and be done with this distinction, true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment