Skip to content

Instantly share code, notes, and snippets.

@jacob-ogre
Created June 17, 2014 01:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jacob-ogre/aaf5fb7a620b01106c93 to your computer and use it in GitHub Desktop.
Save jacob-ogre/aaf5fb7a620b01106c93 to your computer and use it in GitHub Desktop.
NUCmer parallelization and --prefix
tl; dr: If parallelizing NUCmer by dividing a query into pieces, use the --prefix
flag to ensure separate mgap files are created rather than the single out.mgap.
### Fuller explanation ###
Trying to parallelize NUCmer alignment of assembly contigs to PacBio reads I ran
(repeatedly) into the error:
ERROR: Could not parse input from 'Query File'.
Please check the filename and format, or file a bug report
The go-to answers from Google suggest bad fasta files, particularly when Windows
EOL encoding (\r\n) is present. The contig fastas were just fine, and after
digging into the source code, I found in postnuc.cc the following comment:
//-- If a B sequence not seen yet, read it in
//-- IMPORTANT: The B sequences in the synteny object are assumed to be
// ordered as output by mgaps, if they are not in order the program
// will fail. (All like tags must be adjacent and in the same order
// as the query file)
My parallel processing approach involves just splitting the query into 16 parts
and running each search on a separate process. However, the default action of
NUCmer is to generate output files with prefix 'out' (so out.mgaps, out.ntref),
and each piece of the query is output to the same out.* file. I think that the
hits are inter-leaved, which is what the above comment is warning against.
The solution is to use the --prefix flag, with a unique prefix for each query
file.
@jacob-ogre
Copy link
Author

I may have posted a little too quickly...not certain that the current run will finish correctly, but it's looking OK so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment