Skip to content

Instantly share code, notes, and snippets.

Last active Sep 27, 2016
What would you like to do?
binary heatmap demo


Using bioconda, you can install everything you need for this example with

conda install --channel bioconda pybedtools bedtools htslib matplotlib 


Run to download data from ENCODE.


Input files have the following format (UCSC broadPeak and narrowPeak formats, which ar variants of BED format):

chr1    569797  570055  .       1000    .       38.118451       16.0    -1
chr1    724125  2647713 .       258     .       1.259053        11.2    -1
chr1    752542  752779  .       658     .       10.178273       1.9     -1

Then run to generate the plot, a summary file, and a directory of interval files for each class.




Rows are genomic intervals (as output by bedtools multiinter); columns are input BED files; black indicates that factor was found in that genomic interval.



Summary of how many genomic intervals for each combinatorial class:

             LSD1: 16181
        LSD1,TAL1: 15120
             TAL1: 7989
  GATA1,LSD1,TAL1: 3009
       GATA1,LSD1: 654
            GATA1: 231
       GATA1,TAL1: 214


For each of the above classes, a BED file of the indicated intervals. For example,

track name="LSD1_and_TAL1"
chr1    778211  778487
chr1    854053  854329
chr1    948500  948776
import os
from matplotlib import pyplot as plt
from pybedtools.contrib import plotting
import pybedtools
import numpy as np
# set up the order in which to plot the columns of the binary heatmap
names, bts = zip(*[
('GATA1', 'wgEncodeAwgTfbsSydhK562Gata1UcdUniPk.narrowPeak.gz'),
('LSD1', 'wgEncodeBroadHistoneK562Lsd1Pk.broadPeak.gz'),
('TAL1', 'wgEncodeAwgTfbsSydhK562Tal1sc12984IggmusUniPk.narrowPeak.gz'),
bts = [pybedtools.BedTool(i).sort() for i in bts]
# set up the object by giving it a list of pybedtool.BedTool objects and a list
# of names to use.
b = plotting.BinaryHeatmap(
# plot it
# write out how many genomic location of each class were identified
with open('class_counts.txt', 'w') as fout:
for cls, cnt in sorted(b.class_counts.items(), key=lambda x: x[1], reverse=True):
fout.write('{0:>25}: {1:<15}\n'.format(cls, cnt))
# write out the actual intervals from each class
out_dir = 'intervals'
if not os.path.exists(out_dir):
for k, v in b.classified_intervals.items():
label = k.replace(',', '_and_')
v.cut([0, 1, 2]).saveas(os.path.join(out_dir, label + '.bed'), trackline='track name="%s"' % label)
# save the figure
fig = plt.gcf()
wget --no-clobber
wget --no-clobber
wget --no-clobber
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment