Skip to content

Instantly share code, notes, and snippets.

@garrettr
Created September 4, 2012 23:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save garrettr/3628225 to your computer and use it in GitHub Desktop.
Save garrettr/3628225 to your computer and use it in GitHub Desktop.
Statmap Gapped Reads Timeline
# Plan for finishing Gapped Reads Mapping
Here is the remaining work that needs to be done to map gapped reads (RNASeq) with Statmap.
1. [x] Change the binary search in find_matching_mapped_locations to return as matches all locations within a range (REFERENCE_INSERT_LENGTH_MAX in config.h) including and beyond the base location's start.
Done on: 8/31/12
2. [x] Handle multiple matches in the candidate mapping building. Storing them is easy (we used a mapped_locations from the beginning with this in mind).
Done on: 9/3/12
Once that is done, we will have a set of candidate mappings, as we had before. The next step is to rework the recheck. As preparation, the first step is
3. **UPDATE**: The original goal was to rewrite recheck_locations for candidate mappings at this stage. Actually, it is not correct to recheck locations at this stage - we should check across the whole mapping once the candidate mappings have been joined.
a) Remove the RECHECK flag (finally) from candidate mappings and the recheck
code from find_candidate_mappings.
Done by: 9/4/12
b) Write (naive, for now) recheck code using likelihood ratio test for the
sets of joined candidate mappings.
Done by: 9/4/12
Question: can we update the error data if we haven't done the recheck? Should
we do an initial recheck on any unique mappers we see?
5. can_be_used_to_update_error_data needs to be rewritten. This might be tricky - in the case of any situation with multiple indexable subtemplates, we're going to have multiple candidate mappings here, and the mappings->length == 1 || 2 test isn't going to work. Since we know the number of indexable subtemplates, the best thing to do might be to pass this into the update_error_data function, and make sure mappings->length == num_indexable_subtemplates || num_indexable_subtemplates*2. This does make the whole comparison of sequences for the diploid possibility a bit tricky though - I will need to revisit it, as a few minutes of thinking didn't turn up an obvious solution.
Writing a function like candidate_mappings_are_a_unique_mapper is the basic goal here.
6. update_error_data_record_from_candidate_mappings, likewise. This shouldn't be so hard - just do the update process for each candidate mapping in a loop.
At this point it's time to stop, breathe, and get the RNASeq unit test working. All of the marginal mapping for gapped reads is complete at this point. Straightforward map and compare mappings back to genome thing, this time using the cigar strings for the comparison to see how it handles the introns.
And then there are two more steps, which I will require your assistance with.
7. fragment length distribution
estimation? or do we need it earlier? chicken and egg? also this code has no comments.
8. iterative mapping. I think this one is for you, since I don't really understand what all is going on in there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment