TDF is a binary format developed by the IGV team at Broad Institute.
- Master index
Master index contains information about available datasets and groups.
- Dataset
High-level named data container.
- Tile
Low-level data container. Each tile holds interval data for a specific genome region.
- Group
Key-value metadata container.
- Window function
See IGV sources for possible values.
- Track name
A human-readable track name.
- Track type
See IGV soures for possible values.
- Track line
UCSC browser track line. See UCSC documentation.
- Zoom level
TODO
TDF header consists of fixed-size 24 byte component and variable size metadata.
Field | Type |
---|---|
magic | int32 |
version | int32 |
master index offset | int64 |
master index size | int32 |
header size | int32 |
The first three bytes of the file (aka "magic" bytes) can be either "TDF"
or "IBD"
followed by a single-digit format version. The latest format version is 4. Unforunately between-version changes were not documented.
header size
refers to the number of bytes in the following variable-size component.
Field | Type |
---|---|
# of window functions | int32 |
[window function name] | null-terminated string (enum) |
track type | null-terminated string (enum) |
track line | null-terminated string |
# of track names | int32 |
[track name] | null-terminated string |
build | null-terminated string |
flags | int32 |
Hereinafter []
mean that the field can be repeated multiple times. The exact number of occurences is given in the preceeding #
field.
As of version 4 flags
can only carry 0
(uncompressed) or 0x1
(gzip-compressed).
Field | Type |
---|---|
# of datasets | int32 |
[dataset name | null-terminated string |
|
int64 |
|
int32 |
# of groups | int32 |
[group name | null-terminated string |
|
int64 |
|
int32 |
It's perfectly valid for the master index to have zero datasets and groups, thus the repeated fields ([]
notation) can be empty.
Field | Type |
---|---|
# of attributes | int32 |
[key | null-terminated string |
|
null-terminated string |
data type | null-terminated string |
tile width | float32 (!) |
# of tiles | int32 |
[tile offset | int64 |
|
int32 |
In theory dataset is abstract wrt to the data type stored in the tiles, but IGV implementation seems to always use floats.
Field | Type |
---|---|
# of attributes | int32 |
[key | null-terminated string |
|
null-terminated string |
A tile starts with a null-terminated string --- tile type. IGV implements four types of tiles: "fixedStep"
, "variableStep"
, "bed"
and "bedWithNames"
.
Field | Type |
---|---|
# of intervals | int32 |
track start | int32 |
span | int32 |
# of tracks | int32 (missing in IGV) |
[track] | float32 array |
Fixed step tile in TDF is conceptually similar to that of the WIG format. It describes non-overlapping fixed-with intervals. For example, a fixed step tile of size
3 with span
equal to 5 might look like:
-2. 4.8 0
- ----- track 1
1.3 3 -1
----- track 2
start
Field | Type |
---|---|
track start | int32 (unused in IGV) |
span | float32 (!) |
# of intervals | int32 |
[start] | int32 |
# of tracks | int32 |
[track] | float32 array |
Variable step tile also resembles a similarly named concept from the WIG format. As the name suggests it allows the intervals to have arbitrary start offsets. The end offsets remain fixed by the span
value.
Here's an example:
0123456789012
-2.
|-----| track 1
1.3
|-----| track 2
4.8
|-----| track 1
3
|-----| track 2
0
|-----| track 1
-1
|-----| track 2
The above example has span
equal to 5 and starts
equal to [0, 3, 5]
.
Field | Type |
---|---|
# of intervals | int32 |
[start] | int32 array |
[end] | int32 array |
# of tracks | int32 |
[track] | float32 array |
[name] | null-terminated string (only for "bedWithName" ) |
Bed tile allows for intervals with arbitrary start and end offsets. Tiles with type "bedWithName"
can also label each interval with an string.
TODO