TDF is a binary format developed by the IGV team at Broad Institute.
- Master index
- Master index contains information about available datasets and groups.
- Dataset
- High-level named data container.
- Tile
- Low-level data container. Each tile holds interval data for a specific genome region.
- Group
- Key-value metadata container.
- Window function
- See IGV sources for possible values.
- Track name
- A human-readable track name.
- Track type
- See IGV soures for possible values.
- Track line
- UCSC browser track line. See UCSC documentation.
- Zoom level
- TODO
TDF header consists of fixed-size 24 byte component and variable size metadata.
Field | Type |
---|---|
magic | int32 |
version | int32 |
master index offset | int64 |
master index size | int32 |
header size | int32 |
The first three bytes of the file (aka "magic" bytes) can be either
"TDF"
or "IBD"
followed by a single-digit format version. The
latest format version is 4. Unforunately between-version changes were
not documented.
header size
refers to the number of bytes in the following
variable-size component.
Field | Type |
---|---|
# of window functions | int32 |
[window function name] | null-terminated string (enum) |
track type | null-terminated string (enum) |
track line | null-terminated string |
# of track names | int32 |
[track name] | null-terminated string |
build | null-terminated string |
flags | int32 |
Hereinafter []
mean that the field can be repeated multiple times.
The exact number of occurences is given in the preceeding #
field.
As of version 4 flags
can only carry 0
(uncompressed) or 0x1
(gzip-compressed).
Field | Type |
---|---|
# of datasets | int32 |
[dataset name | null-terminated string |
offset | int64 |
size in bytes] | int32 |
# of groups | int32 |
[group name | null-terminated string |
offset | int64 |
size in bytes] | int32 |
It's perfectly valid for the master index to have zero datasets and
groups, thus the repeated fields ([]
notation) can be empty.
Field | Type |
---|---|
# of attributes | int32 |
[key | null-terminated string |
value] | null-terminated string |
data type | null-terminated string |
tile width | float32 (!) |
# of tiles | int32 |
[tile offset | int64 |
size in bytes] | int32 |
In theory dataset is abstract wrt to the data type stored in the tiles, but IGV implementation seems to always use floats.
Field | Type |
---|---|
# of attributes | int32 |
[key | null-terminated string |
value] | null-terminated string |
A tile starts with a null-terminated string --- tile type. IGV implements
four types of tiles: "fixedStep"
, "variableStep"
, "bed"
and
"bedWithNames"
.
Field | Type |
---|---|
# of intervals | int32 |
track start | int32 |
span | int32 |
# of tracks | int32 (missing in IGV) |
[track] | float32 array |
Fixed step tile in TDF is conceptually similar to that of the WIG
format. It describes non-overlapping fixed-with intervals. For example,
a fixed step tile of size
3 with span
equal to 5 might look
like:
-2. 4.8 0 |-----|-----|-----| track 1 1.3 3 -1 |-----|-----|-----| track 2 start
Field | Type |
---|---|
track start | int32 (unused in IGV) |
span | float32 (!) |
# of intervals | int32 |
[start] | int32 |
# of tracks | int32 |
[track] | float32 array |
Variable step tile also resembles a similarly named concept from
the WIG format. As the name suggests it allows the intervals to
have arbitrary start offsets. The end offsets remain fixed by the
span
value.
Here's an example:
0123456789012 -2. |-----| track 1 1.3 |-----| track 2 4.8 |-----| track 1 3 |-----| track 2 0 |-----| track 1 -1 |-----| track 2
The above example has span
equal to 5 and starts
equal to
[0, 3, 5]
.
Field | Type |
---|---|
# of intervals | int32 |
[start] | int32 array |
[end] | int32 array |
# of tracks | int32 |
[track] | float32 array |
[name] | null-terminated string (only for "bedWithName" ) |
Bed tile allows for intervals with arbitrary start and end offsets.
Tiles with type "bedWithName"
can also label each interval with
an string.
TODO