Skip to content

Instantly share code, notes, and snippets.

@taoliu
Last active May 15, 2024 07:51
Show Gist options
  • Save taoliu/5772f4b44767ff07b85ba11c42a57b78 to your computer and use it in GitHub Desktop.
Save taoliu/5772f4b44767ff07b85ba11c42a57b78 to your computer and use it in GitHub Desktop.
Merge multiple peak files (BED format) then call the consensus regions
#!/bin/bash
#
# This script will find the consensus peak regions from peak files (in
# BED format) of multiple samples by:
#
# 1. Converting the peak file of each sample into non-overlapping 3
# cols BED file and concatenating them;
#
# 2. Sorting the concatenated file and Building a genome coverage
# track in BedGraph, of which the value (the 3rd col) indicates the
# number of samples with a peak at a particular location;
#
# 3. Using MACS to call regions being covered by peaks from more than
# a certain number of samples.
# ------------------------------------------------------------------
# Modify the following parameters
#
# define the peaks from multiple samples
SAMPLE_PEAKS=`ls *.narrowPeak`
# define the CUTOFF
CUTOFF=3
# define the minlen and maxgap for peak calling (arbitrary)
MINLEN=200
MAXGAP=30
# define the genome file with the 1st column as chromosome name and
# the 2nd column as chromosome length
GENOME=chrom.len
# define the OUTPUT_PREFIX
OUTPUT_PREFIX="consensus_peaks"
# ------------------------------------------------------------------
# 1
I=$SAMPLE_PEAKS
O=${OUTPUT_PREFIX}.all.bed
rm -f $O
touch $O
for f in $I; do
bedtools sort -i $f | bedtools merge -i - | cut -f 1,2,3 >> $O;
done
# 2
I=${OUTPUT_PREFIX}.all.bed
O=${OUTPUT_PREFIX}.bdg
bedtools sort -i $I | bedtools genomecov -bga -i - -g $GENOME > $O
#3
I=${OUTPUT_PREFIX}.bdg
O=${OUTPUT_PREFIX}.consensus.bed
macs2 bdgpeakcall -i $I -o $O --no-trackline -c $CUTOFF -g $MAXGAP -l $MINLEN
# end
echo "All done. Check ${O}"
@Ajeet1699
Copy link

very helpful to find consensus peak in samples. Thanks

@drakaki
Copy link

drakaki commented May 15, 2024

Hello!

Very helpful script, thank you.
I have some, pearhaps naive, questions about tje consensus peaks. Say, for example, I have 4 replicate samples and I manually find the overlapping peaks between them and keep the peaks with overlap in at least 3 of the 4 samples (using something like bedtools intersect).
Do you think those will be different than the consensus peaks found using your script?

Also, say I have found consensus peaks across replicates for a TF ChIP in control versus stimulation conditions. I now want to get differential peaks between control and stimulation with the MAnorm tool. The tool requires that I provide the peak files as well as the read files. Can I use the consensus peak file of each condition with the read file produced by merging all the initial replicate bam files?

Or does it make more sense to merge the bam files for each condition, call peaks and then get differential peaks between control and stimulation?

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment