BEDPE coordinates refer to a genomic position, but it is unclear to me what position (relative to an SV) they are intended to convey. This is illustrated in the case where we know precisely where the breakpoints are.
- Affected Bases (AFF)
- Left of the breakpoint (LOB)
- Right of the breakpoint (ROB)
- Exact breakpoint (BPT)
- Last-aligned Base (LAB)
- Simple Deletions
- Simple Insertions
- Range Math on coordinates
- Balanced Translocations/Inversion
- Telomeric Deletions
- Unbalanced Translocations/Inversions
- Telomeric Insertions
We will call the chromosome below 'chr'.
Plain alignment:
REF ACGTGCC
ALT A-----C
With 0-based coordinates (BED):
0123456
REF ACGTGCC
ALT A-----C
With 1-based coordinates (VCF):
1234567
REF ACGTGCC
ALT A-----C
Assume chromosome name is 1
chr 1 . ACGTGC A . PASS SVTYPE=DEL;END=6
chr 1 . A A[chr:7[ . PASS SVTYPE=BND
chr 7 . C ]chr:1]C . PASS SVTYPE=BND
The coordinates label the first and last deleted bases.
chr 1 2 chr 5 6
- Note that for range arithmetic, the length would be end2 - start1
The coordinates label the base to the left of the breakpoint(s).
chr 0 1 chr 5 6
- Note that for range arithmetic, the length would be start2 - start1 or end2 - end1 but that start2 - end1 and end2 - start1 would not give the length.
The coordinates label the base to the right of the breakpoint(s).
chr 1 2 chr 6 7
- Note that for range arithmetic, the length would be start2 - start1 or end2 - end1 but that start2 - end1 and end2 - start1 would not give the length.
Coordinates are 0-length ranges specifying the position of the breakpoint.
chr 1 1 chr 6 6
- Note that for range arithmetic, the length is the same no matter which coordinates you use between the two coordinate sets.
Coordinates specify the "last-aligned base" as in VCF
chr 0 1 chr 6 7
- Note that for range arithmetic, the length would be start2 - end1 or end2 - end1 - 1 or start2 - start1 -1.
We will call the chromosome below 'chr'.
Plain alignment:
REF A-----C
ALT ACGTGCC
With 0-based coordinates (BED):
0123456
REF A-----C
ALT ACGTGCC
With 1-based coordinates (VCF):
1234567
REF A-----C
ALT ACGTGCC
Assume chromosome name is chr
chr 1 . A ACGTGC . PASS SVTYPE=INS;END=1
chr 1 . A ACGTGC[chr:2[ . PASS SVTYPE=BND
chr 2 . C ]chr:1]CGTGCC . PASS SVTYPE=BND
Note that range arithmetic would not apply to these cases as insertion size has no effect on the coordinates of the reference.
This seems to make no sense. What base is the affected-based? You would have to fall back to either leftmost or rightmost base in this case. See those below.
chr 0 1 chr 0 1
chr 1 2 chr 1 2
chr 1 1 chr 1 1
chr 0 1 chr 1 2
Will only consider inversions as those actually have some range math applications that may prove illustrative.
We will call the chromosome below 'chr'
Plain alignment:
REF ATGTGCC
ALT AGCACAC
With 0-based coordinates (BED):
0123456
REF ATGTGCC
ALT AGCACAC
With 1-based coordinates (VCF):
1234567
REF ATGTGCC
ALT AGCACAC
chr 1 . ATGTGC AGCACA . PASS SVTYPE=INV;END=6
chr 1 . A A]chr:6] . PASS SVTYPE=BND
chr 2 . T [chr:7[T . PASS SVTYPE=BND
chr 6 . C C]chr:1] . PASS SVTYPE=BND
chr 7 . C [chr:2[C . PASS SVTYPE=BND
chr 1 2 chr 5 6
- Note for range arithmetic, the length would be end2 - start1
chr 0 1 chr 5 6
- Note that for range operations, the length is end2 - end1.
chr 1 2 chr 6 7
- Note that for range operations, the length is end2 - end1.
chr 1 1 chr 6 6
- Note that for the range arithmetic, the length is the same no matter which coordinates you use.
chr 0 1 chr 6 7
- Note that for the range arithmetic, the length end2 - start2 - 1.
We will call the chromosome below 'chr'.
Plain alignment:
REF ACGTGCC
ALT A------
With 0-based coordinates (BED):
0123456
REF ACGTGCC
ALT A------
With 1-based coordinates (VCF):
1234567
REF ACGTGCC
ALT A------
Assume chromosome name is 1
chr 1 . A <DEL> . PASS SVTYPE=DEL;END=7
chr 1 . A .[chr:8[ . PASS SVTYPE=BND
chr 8 . N ]chr:1]. . PASS SVTYPE=BND
The coordinates label the first and last deleted bases.
chr 1 2 chr 6 7
- Note that for range arithmetic, the length would be end2 - start1
The coordinates label the base to the left of the breakpoint(s).
chr 0 1 chr 6 7
- Note that for range arithmetic, the length would be end2 - end1
The coordinates label the base to the right of the breakpoint(s). It would have to be allowed or hacked to go greater than the length of the reference for BED
chr 1 2 chr 7 8
- Note that for range arithmetic, the length would be start2 - start1 or end2 - end1 but that start2 - end1 and end2 - start1 would not give the length.
Coordinates are 0-length ranges specifying the position of the breakpoint.
chr 1 1 chr 7 7
- Note that for range arithmetic, the length is the same no matter which coordinates you use between the two coordinate sets.
Coordinates specify the "last-aligned base" as in VCF. This would also have to allow for virtual bases off the end of the reference
chr 0 1 chr 7 8
- Note that for range arithmetic, the length would be start2 - start1 - 1 or end2 - end1 -1.
We will call the chromosome below 'chr'.
Plain alignment:
REF ACGTGCC
ALT ------C
With 0-based coordinates (BED):
0123456
REF ACGTGCC
ALT ------C
With 1-based coordinates (VCF):
1234567
REF ACGTGCC
ALT ------C
Assume chromosome name is 1
chr 0 . N <DEL> . PASS SVTYPE=DEL;END=6
chr 0 . N .[chr:7[ . PASS SVTYPE=BND
chr 7 . C ]chr:0]C . PASS SVTYPE=BND
The coordinates label the first and last deleted bases.
chr 0 1 chr 5 6
- Note that for range arithmetic, the length would be end2 - start1
The coordinates label the base to the left of the breakpoint(s). This breaks for this variant type.
The coordinates label the base to the right of the breakpoint(s).
chr 0 1 chr 6 7
- Note that for range arithmetic, the length would be start2 - start1 or end2 - end1 but that start2 - end1 and end2 - start1 would not give the length.
Coordinates are 0-length ranges specifying the position of the breakpoint.
chr 0 0 chr 6 6
- Note that for range arithmetic, the length is the same no matter which coordinates you use between the two coordinate sets.
Coordinates specify the "last-aligned base" as in VCF. This breaks for the same reason as left-most base.
In VCF, these have to be explicitly labeled as a pair.
chr2 321681 bnd_W G G[chr13:123460[ . PASS SVTYPE=BND;PARID=bnd_V;MATEID=bnd_X
chr2 321682 bnd_V T ]chr13:123456]T . PASS SVTYPE=BND;PARID=bnd_W;MATEID=bnd_U
chr13 123456 bnd_U C C[chr2:321682[ . PASS SVTYPE=BND;PARID=bnd_X;MATEID=bnd_V
chr13 123460 bnd_X A ]chr2:321681]A . PASS SVTYPE=BND;PARID=bnd_U;MATEID=bnd_W
The coordinates label the first and last bases affected. Not clear what this means here. I contend it is invalid and you'd have to fallback to one of the other methodologies below.
The coordinates label the base to the left of the breakpoint(s).
chr2 321680 321681 chr13 123458 123459
chr2 321680 321681 chr13 123455 123456
chr13 123455 123456 chr2 321680 321681
chr13 123458 123459 chr2 321680 321681
The coordinates label the base to the right of the breakpoint(s).
chr2 321681 321682 chr13 123459 123460
chr2 321682 321683 chr13 123456 123457
chr13 123456 123457 chr2 321681 321682
chr13 123459 123460 chr2 321681 321682
Coordinates are 0-length ranges specifying the position of the breakpoint.
chr2 321681 321681 chr13 123459 123459
chr2 321681 321681 chr13 123456 123456
Coordinates specify the "last-aligned base" as in VCF.
chr2 321680 321681 chr13 123459 123460
chr2 321681 321682 chr13 123455 123456
chr13 123455 123456 chr2 321681 321682
chr13 123459 123460 chr2 321680 321681
We will call the chromosome below 'chr'.
Plain alignment:
REF --ACGTGCC
ALT GCACGTGCC
With 0-based coordinates (BED):
0123456
REF --ACGTGCC
ALT GCACGTGCC
With 1-based coordinates (VCF):
1234567
REF --ACGTGCC
ALT GCACGTGCC
Assume chromosome name is 1
chr 0 . N <INS> . PASS SVTYPE=INS;END=0
chr 0 . N .[ctg1:1[ . PASS SVTYPE=BND
chr 1 . A ]ctg1:1000]A . PASS SVTYPE=BND
The coordinates label the first and last affected bases. Would have to label the base before the insertion. Can't do this in BED.
The coordinates label the base to the left of the breakpoint(s). This breaks for this variant type.
The coordinates label the base to the right of the breakpoint(s).
chr 0 1 chr 0 1
Coordinates are 0-length ranges specifying the position of the breakpoint.
chr 0 0 chr 0 0
Coordinates specify the "last-aligned base" as in VCF. This breaks for the same reason as left-most base.
We will call the chromosome below 'chr'.
Plain alignment:
REF ACGTGCC--
ALT ACGTGCCGC
With 0-based coordinates (BED):
0123456
REF ACGTGCC--
ALT ACGTGCCGC
With 1-based coordinates (VCF):
1234567
REF ACGTGCC--
ALT ACGTGCCGC
Assume chromosome name is 1
chr 7 . C CGC . PASS SVTYPE=INS;END=7
chr 7 . C C[ctg1:1[ . PASS SVTYPE=BND
chr 8 . N ]ctg1:1000]. . PASS SVTYPE=BND
The coordinates label the first and last affected bases. For insertions this could/should be the base to the left of the event.
chr 6 7 chr 6 7
The coordinates label the base to the left of the breakpoint(s).
chr 6 7 chr 6 7
The coordinates label the base to the right of the breakpoint(s).
chr 7 8 chr 7 8
Coordinates are 0-length ranges specifying the position of the breakpoint.
chr 7 7 chr 7 7
Coordinates specify the "last-aligned base" as in VCF.
chr 6 7 chr 6 7