Skip to content

Instantly share code, notes, and snippets.

@cschin
Created July 19, 2014 19:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cschin/133df440b9b10448a54f to your computer and use it in GitHub Desktop.
Save cschin/133df440b9b10448a54f to your computer and use it in GitHub Desktop.
Some thought about Heng Li's proposal for assembly graph format http://lh3.github.io/2014/07/19/a-proposal-of-the-grapical-fragment-assembly-format/
some quick comments.
Is this format trying represent the raw overlaps or finally assembly graph or both?
It seems to me that it is more suitable for the first. In the work to represent diploid genome assembly, I had to do multiple level of reduction of the graph from the initial string/overlap graph to simply the problem. if we are looking at a more reduced assembly, we might have to deal with edges corresponding to unitigs with the same in and out nodes. In this format, such bubble paths (difference between them bigger than small indel) will be in different row, the behavior of such edges with the same in and out node should be defined. What I did for diploid work is to assign uid for each edges.
Also, I do think the final assembly should avoid the bidirectional edges. It should be resolved by the assembler. From pragmatic point, it will confuse a lot of biologists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment