Skip to content

Instantly share code, notes, and snippets.

@averagehat
Created April 23, 2015 13:09
AMOS 3-code message types
v1.3.0
NOTES:
- See message_grammar.rtf for the message file format definition.
- All fields are optional, but some programs might not like missing fields (e.g. a seq field without a qlt field).Empty fields are not allowed. If there is no data for a given filed, omit it from the message.
- Acceptable field data is represented by Perl regular expressions. All regular expressions will be contained in parens () or brackets []. If uncontained, interpret characters as literal.
- Field or message references are contained in <>.
Strict field ordering is not required. The ordering of fields in this definition is arbitrary.
- Message inheritance is noted in C++ style. Fields inherited from a parent message will be listed but not described.
- Ranges are specified as a pair of positions [x,y) where x is exclusive and y is inclusive. Thus, the range 4,6 would represent the 2 symbols at positions 4 and 5. Sequence positions are also indexed by this gap coordinate system, which essentially translates to a 0 based indexing scheme. e.g. the range [2,5) for the list 0,1,2,3,4,5,6 would define the sublist 2,3,4. Reversed ranges are also allowed, for example (5,2] would define the subset 4,3,2.
TYPES:
Universal_t : IBankable_t, IMessagable_t
{UNV
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
}
act- Action. [A]dd, [D]elete, [R]eplace message. If absent, default action will be addition.
iid – Internal (AMOS) ID. This integer ID must be unique among all objects of the same type. This is the ID used for all object links and thus is mandatory if other objects are to link to this one.
eid - External ID. This string ID must be unique among all objects of the same type. The ID may not contain any newlines, but may be any length.
com - Free-from comment field.
flg – Two generic boolean flags (A/B), default to zero if unspecified.
sts – Object status character.
Contig_t : Sequence_t
{CTG
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
seq:(\n(.*\n)*).
qlt:(\n(.*\n)*).
<TLE message>*
}
<TLE message> - Tiling of underlying reads.
ContigEdge_t : ContigLink_t, Edge_t
{CTE
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
nds:<iid>,<iid>
obj:<message type>
adj:[NAOI]
std:(\d+)
sze:(-?\d+)
typ:[XMOPAS]
src:<iid>,<message type>
lnk:(\n(<iid>\n)*)
}
obj – Removed. All nodes are Contig_t.
ContigLink_t : Link_t
{CTL
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
nds:<iid>,<iid>
obj:<message type>
adj:[NAOI]
std:(\d+)
sze:(-?\d+)
typ:[XMOPAS]
src:<iid>,<message type>
}
obj – Removed. All nodes are Contig_t.
Distribution_t : IMessagable_t
{DST
mea:(\d+)
std:(\d+)
}
mea - Mean.
std - Standard deviation.
Edge_t : Link_t
{EDG
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
nds:<iid>,<iid>
obj:<message type>
adj:[NAOI]
std:(\d+)
sze:(-?\d+)
typ:[XMOPAS]
src:<iid>,<message type>
lnk:(\n(<iid>\n)*).
}
lnk - List of bundled links, referenced by their IIDs.
Feature_t : Universal_t
{FEA
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
clr:(\d+,\d+)
typ:[RUJCOP.]
src:<iid>,<message type>
}
clr – Range/position of the feature.
typ – Feature type. [R]epeat, [U]nitig, [J]oin, [C]overage, [O]RF, [P]olymorphism.
src - Source of the feature, e.g. a contig, referenced by its IID and type.
Fragment_t : Universal_t
{FRG
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
lib:<iid>
rds:<iid>,<iid>
sze:(\d+)
typ:[XBITW]
src:<iid>,<message type>
}
lib - Parent library, referenced by its IID.
rds – The paired sequencing reads, referenced by their IIDs.
sze - Size of the fragment, if known.
typ - Type of fragment. [X]Other, [B]AC, [I]nsert, [T]ransposon, [W]alk.
src - Source of this piece of DNA, e.g. a BAC fragment, referenced by its IID and type.
Group_t : Universal_t
{GRP
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
mbr:(\n(<iid>\n)*).
obj:<message type>
}
mbr - List of group members, referenced by IID.
obj - The object type of the members.
IDMap_t : IMessagable_t
{MAP
sze:(\d+)
map:(\n(<bid>\t<iid>\t<eid>\n)*).
obj:<message type>
}
sze - Number of ID triples in the map.
map - List of ID triples, BID <-> IID <-> EID.
obj - The object type of the ID triples.
Index_t : Universal_t
{IDX
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
sze:(\d+)
map:(\n(<iid>\t<iid>\n)*).
obj:<message type>,<message type>
}
sze - Number of ID pairs in the index.
map - List of ID pairs, IID -> IID
obj - The object type of the ID pairs.
Kmer_t : Universal_t
{KMR
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
cnt:(\d+)
seq:([ACGT]+)
rds:(\n(<iid>\n)*).
}
cnt - Number of occurrences of this Kmer.
seq - Sequence of this Kmer.
rds - List of reads that contain this Kmer, referenced by their IIDs.
Layout_t : Universal_t
{LAY
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
<TLE message>*
}
<TLE message> - Tiling of underlying reads.
Library_t : Universal_t
{LIB
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
<DST message>
}
<DST message> - Library size distribution stats.
Link_t : Universal_t
{LNK
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
nds:<iid>,<iid>
obj:<message type>
adj:[NAOI]
std:(\d+)
sze:(-?\d+)
typ:[XMOPAS]
src:<iid>,<message type>
}
nds – The linked nodes, referenced by their IIDs.
obj – The object type of the nodes.
adj - Node adjacency. [N]ormal, [A]nti-normal, [I]nnie, [O]utie which are EB, BE, EE, BB adjacencies respectively.
std - Standard deviation of the link size.
sze - Size of link.
typ - Type of link. [X]Other, [M]atepair, [O]verlap, [P]hysical, [A]lignment, [S]ynteny.
src - Source of the link, e.g. fragment information, referenced by its IID and type.
Overlap_t : Universal_t
{OVL
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
rds:<iid>,<iid>
adj:[NAIO]
ahg:(-?\d+)
bhg:(-?\d+)
scr:(\d+)
flg:([01]{3})
}
rds – The overlapping reads, referenced by their IIDs.
adj - Read adjacency. [N]ormal, [A]nti-normal, [I]nnie, [O]utie which are EB, BE, EE, BB overlaps respectively.
ahg - Ahang. Length of the non-overlapping portion of the first read.
bhg - Bhang. Length of the non-overlapping portion of the second read.
scr – An unsigned integer overlap score.
flg – Universal_t flags plus one additional flag (A/B/C), default to zero if unspecified.
Read_t : Sequence_t
{RED
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
frg:<iid>
typ:[XECBW]
clr:(\d+,\d+)
vcr:(\d+,\d+)
qcr:(\d+,\d+)
pos:(-?\d+)
bcp:(\n(\d+\n)*).
}
frg - The parent fragment, referenced by its IID.
typ - Type of read. [X]Other, [E]nd, [C]ontig, [B]AC, [W]alk.
clr - The acting clear range.
vcr - Vector clear range.
qcr - Quality clear range.
pos - Approximate position on the parent fragment. Positive if counting from left and oriented forward, negative if counting from right and reverse orientated. 0 if unknown.
bcp – Absolute base call positions.
Scaffold_t : Universal_t
{SCF
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
edg:(\n(<iid>\n)*).
<TLE message>*
}
edg - List of contig edges, referenced by their IIDs.
<TLE message> - Tiling of the underlying contigs.
ScaffoldEdge_t : ScaffoldLink_t, Edge_t
{SCE
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
nds:<iid>,<iid>
obj:<message type>
adj:[NAOI]
std:(\d+)
sze:(-?\d+)
typ:[XMOPAS]
src:<iid>,<message type>
lnk:(\n(<iid>\n)*)
}
obj – Removed. All nodes are Scaffold_t.
ScaffoldLink_t : Link_t
{SCL
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
nds:<iid>,<iid>
obj:<message type>
adj:[NAOI]
std:(\d+)
sze:(-?\d+)
typ:[XMOPAS]
src:<iid>,<message type>
}
obj – Removed. All nodes are Scaffold_t.
Sequence_t : Universal_t
{SEQ
act:[ADR]
iid:(\d+)
eid:(.+)
com:(\n(.*\n)*).
flg:([01]{2})
sts:[.]
seq:(\n(.*\n)*).
qlt:(\n(.*\n)*).
}
seq - Sequence base call information.
qlt - Sequence quality information.
Tile_t : IMessageable_t
{TLE
src:<iid>
off:(-?\d+)
clr:(\d+,\d+)
gap:(\n(-?\d+\n)*).
}
src - Tiled sequence, referenced by its IID. Type of sequence is implied by how this record is nested, e.g. a TLE in a CTG represents a RED, while a TLE in a SCF represents a CTG.
off - Offset of the tile from the beginning of the reference.
clr - Usable range of the tile, relative to the tile’s coordinates.
gap - List of delta encoded gap positions.
::END OF DOCUMENT::
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment