Skip to content

Instantly share code, notes, and snippets.

@slowkow
Last active December 20, 2015 18:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slowkow/6180362 to your computer and use it in GitHub Desktop.
Save slowkow/6180362 to your computer and use it in GitHub Desktop.
Genomic intervals: 0-based, 1-based, overlaps, and distance
"""
genomic_intervals.py
Kamil Slowikowski
February 27, 2014
Genomic intervals: 0-based, 1-based, overlaps, and distance
===========================================================
This document describes genomic intervals and includes source code for testing
overlap and calculating distance between intervals.
You will find files specifying genomic coordinates in two formats:
0-based : 0 1 2 3 4 (UCSC, BED, bedGraph, narrowPeak)
1-based : 1 2 3 4 (NCBI, Ensembl, GFF, GTF, VCF, SAM, BAM, wiggle)
sequence: A T G C
0-based starts with 0 and numbers the *spaces* in between nucleotides.
1-based starts with 1 and numbers the *nucleotides*.
The subsequence "TG" of the full string "ATGC" is:
0-based : [1, 3)
1-based : [2, 3]
The 0-based style does not include the last position: ")"
The 1-based style includes the last position: "]"
This results in different length calculations for subsequence "TG":
0-based : 3 - 1 = 2
1-based : 3 - 2 + 1 = 2
Reference
---------
https://genome.ucsc.edu/FAQ/FAQformat.html
Example
-------
>>> a, b = (1, 3), (3, 7)
>>> print_intervals0(a, b)
01234567890
==
====
>>> print_intervals1(a, b)
1234567890
===
=====
>>> overlap0(a, b)
False
>>> overlap1(a, b)
True
>>> distance0(a, b)
0
>>> distance1(a, b)
-1
"""
# 0-based intervals
def overlap0(a, b):
"""Check if two 0-based intervals overlap."""
# a.start < b.end and a.end > b.start
return a[0] < b[1] and a[1] > b[0]
def distance0(a, b):
"""Get the number of bases between two 1-based intervals, 0 if the
intervals are book-ended against each other, or, if negative, the number
of bases in the overlap.
"""
return max(a[0] - b[1], b[0] - a[1])
def print_intervals0(*intervals):
start = min([i[0] for i in intervals])
stop = max([i[1] for i in intervals])
length = stop - start
print '0' + '1234567890' * ((length + 10) / 10)
for i in intervals:
spaces = ' ' * i[0]
marks = '=' * (i[1] - i[0])
print spaces + marks
# 1-based intervals
def overlap1(a, b):
"""Check if two 1-based intervals overlap."""
# a.start <= b.end and a.end >= b.start
return a[0] <= b[1] and a[1] >= b[0]
def distance1(a, b):
"""Get the number of bases between two 1-based intervals, 0 if the
intervals are book-ended against each other, or, if negative, the number
of bases in the overlap.
"""
return max(a[0] - b[1], b[0] - a[1]) - 1
def print_intervals1(*intervals):
start = min([i[0] for i in intervals])
stop = max([i[1] for i in intervals])
length = stop - start + 1
print '1234567890' * ((length + 10) / 10)
for i in intervals:
spaces = ' ' * (i[0] - 1)
marks = '=' * (i[1] - i[0] + 1)
print spaces + marks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment