General:
Tools | Description |
---|---|
flank | Create new intervals from the flanks of existing intervals. |
slop | Adjust the size of intervals. |
shift | Adjust the position of intervals. |
subtract | Remove intervals based on overlaps b/w two files. |
complement | Extract intervals not represented by an interval file. |
closest | Find the closest, potentially non-overlapping interval. |
intersect | Find overlapping intervals in various ways. |
window | Find overlapping intervals within a window around an interval. |
cluster | Cluster (but don't merge) overlapping/nearby intervals. |
merge | Combine overlapping/nearby intervals into a single interval. |
map | Apply a function to a column for each overlapping interval. |
groupby | Group by common cols. & summarize oth. cols. (~ SQL "groupBy") |
Formatting:
Notes: BED file format, GFF vs BED indexing
Tools | Description |
---|---|
getfasta | Use intervals to extract sequences from a FASTA file. |
maskfasta | Use intervals to mask sequences from a FASTA file. |
sort | Order the intervals in a file. |
bed12tobed6 | Breaks BED12 intervals into discrete BED6 intervals. |
bamtofastq | Convert BAM records to FASTQ records. |
bamtobed | Convert BAM alignments to BED (& other) formats. |
bedpetobam | Convert BEDPE intervals to BAM records. |
bedtobam | Convert intervals to BAM records. |
Statistics:
Tools | Description |
---|---|
jaccard | Calculate the Jaccard statistic b/w two sets of intervals. |
random | Generate random intervals in a genome. |
reldist | Calculate the distribution of relative distances b/w two files. |
shuffle | Randomly redistribute intervals in a genome. |
makewindows | Makes adjacent or sliding windows across a genome or BED file. |
nuc | Profile the nucleotide content of intervals in a FASTA file. |
Coverage:
Tools | Description |
---|---|
annotate | Annotate coverage of features from multiple files. |
coverage | Compute the coverage over defined intervals. |
genomecov | Compute the coverage over an entire genome. |
multicov | Counts coverage from multiple BAMs at specific intervals. |
unionbedg | Combines coverage intervals from multiple BEDGRAPH files. |
- -s, -S : Require same strandedness or opposite strandedness, respectively.
- -f, -F : Minimum overlap required as a fraction of A or a fraction of B respectively.
- -r, -e : Require that the minimum overlap be satisfied for A AND B, or A OR B respectively.
- -split : Treat "split" BAM or BED12 entries as distinct BED intervals.
- -abam : A is a BAM file.
Create new intervals from the flanks of existing intervals. (flank Docs)
Adjust the size of intervals. (slop Docs)
IN ▓▓▓▓▓ ▓▓▓
Flank ██ ██ ██ ██
Slop █████████ ███████
$ bedtools flank [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-b or (-l and -r)]
$ bedtools slop [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-b or (-l and -r)]
OPTIONS | . |
---|---|
-b, -l, -r | Flank/extend regions by x bp on both sides, on the left, or on the right respectively. |
-s | Define -l and -r based on strand. |
-pct | Define -l and -r as a fraction of the feature's length. |
Adjust the position of intervals, while respecting chromosome edges. (Docs).
IN ██ ██ ████
OUT ██ ██ ████
$ bedtools shift [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-s or (-m and -p)]
OPTIONS | . |
---|---|
-s | Number of BPs to shift the features. |
-m, -p | Number of BPs to shift the features on the - strand or + strand, respectively. |
-pct | Define -s, -m and -p as a fraction of the feature's length. |
Remove intervals based on overlaps b/w two files. (Docs)
A ▓▓▓▓▓▓▓▓▓▓ ▓▓▓ ▓▓▓▓▓▓
B ▓▓▓▓ ▓▓▓▓▓▓▓
A sub B ██ ████ ███ ███
$ bedtools subtract [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
OPTIONS | . |
---|---|
-A | Remove entire feature if any overlap. |
common | strandedness: -s, -S; overlap: -f, -F; overlap mode: -r, -e |
Extract intervals not represented by an interval file. (Docs)
IN ▓▓▓▓▓ ▓▓▓ ▓▓▓▓▓▓
▓▓▓▓ ▓▓▓
OUT █████ █████ ██
$ bedtools complement -i <BED/GFF/VCF> -g <GENOME>
Find the closest, potentially non-overlapping interval. (Docs)
A █████ ✓
B ████ ███
$ bedtools closest [OPTIONS] -a <FILE> -b <FILE1, FILE2, ..., FILEN>
OPTIONS | . |
---|---|
-d | Also report distance from A to the closest feature. |
-k | Report the k closest hits. Default: 1. |
-io | Ignore features in B that overlap A. |
-iu, -id | Ignore features in B that are upstream or downstream, respectively, of features in A. |
common | strandedness: -s, -S |
Find overlapping intervals in various ways. (Docs)
A ██████████
B ▓▓▓▓ ▓▓ ▓▓▓
A int B ▓▓ ▓▓
$ bedtools intersect [OPTIONS] -a <BAM/BED/GFF/VCF> -b <FILE1, FILE2, ..., FILEN>
OPTIONS | . |
---|---|
-wa, -wb | Write the original entry in A/original entry in B, respectively, for each overlap. |
-loj | For each feature in A report each overlap with B. Report a NULL feature for B if no overlap. |
-wao | Report A and B features and no. of bp overlap between them. |
-u | Only report each overlapping A feature once. |
-c | For each entry in A, report count of overlapping B features. |
-v | Only report features in A not overlapping B. |
common | strandedness: -s, -S; overlap: -f, -F; overlap mode: -r, -e; bam/bed12: -abam, -split |
Find overlapping intervals within a window around an interval. (Docs)
A ┌────█████────┐
B ▓▓▓▓ ▓▓▓ ▓▓▓
A win B ▓▓▓▓ ▓▓▓
$ bedtools window [OPTIONS] [-a|-abam] -b <BED/GFF/VCF>
OPTIONS | . |
---|---|
-w, -l, -r | Flank length of overlap window in each direction, upstream or downstream, respectively. |
-sw | Define -l and -r based on strand. |
-u | Only report each overlapping A feature once. |
-c | For each entry in A, report count of overlapping B features. |
-v | Only report features in A not overlapping B. |
common | strandedness: -sm, -Sm; bam: -abam |
Cluster (but don't merge) overlapping/nearby intervals. (Docs)
BED ████ █████ ███
clustID └─#1─┘ └────#2────┘
$ bedtools cluster [OPTIONS] -i <BED/GFF/VCF>
OPTIONS | . |
---|---|
-d | Max distance between features in cluster. |
common | strandedness: -s, -S |
For merge
, groupby
, and map
the following* aggregation functions (specified by -o
) can be applied to a column/columns specified by -c
:
sum
, count
, count_distinct
, min
, max
, mean
, median
, mode
, antimode
, stdev
, sstdev
, collapse
, distinct
, first
, last
*Other functions are available.
Combine overlapping/nearby intervals into a single interval. (Docs)
IN ▓▓▓ ▓ ▓▓··d··▓▓▓
▓▓▓▓ ▓▓
OUT ██████ ███ ██████████
$ bedtools merge [OPTIONS] -i <BED/GFF/VCF/BAM>
OPTIONS | . |
---|---|
-s | Require same strandedness. |
-S | Force merge for one specific strand only. Options: <+ /- >. |
-d | Maximum distance between features to be merged. |
common | aggregation: -o, -c; |
Apply a function to a column for each overlapping interval.(Docs)
score = 3 1 5 4 6
B ▓▓▓ ▓ ▓▓▓▓▓ ▓▓▓▓▓▓ ▓▓▓▓
A ██████████ ███████
B map(mean) A ██████████ mean(3,1,5)=5 ███████ mean(4,6)=5
$ bedtools map [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
OPTIONS | . . |
---|---|
common | aggregation: -o, -c; strandedness: -s, -S; overlap: -f, -F; overlap mode: -r, -e; bed12: -split |
Group by common cols & summarize other cols (~ SQL "groupBy"). (Docs)
$ bedtools groupby [OPTIONS] -i <BED> -g <groupby columns> -c <op. column> -o <operation>
OPTIONS | . |
---|---|
common | aggregation: -o, -c |
Column | e.g. | Definition |
---|---|---|
chrom | Sc112.1 | <STR> name of chromosome/scaffold |
start | 2134 | <INT> start position of feature |
end | 2565 | <INT> end position of feature |
name | gene123 | <STR> name of feature |
score | 544 | <NUM> score for the feature e.g. bit score |
strand | + | <+/-/.> strand on which feature is located |
thickStart | 2235 | |
thickEnd | 2489 | |
itemRgb | 255,0,0 | |
blockCount | 2 | <INT> number of blocks (exons) in the feature |
blockSizes | 150,80 | <INT>,<INT>,... list of block sizes |
blockStarts | 0,2333 | <INT>,<INT>,... list of block start positions relative to start position of feature |
GFF ┌─1 2 3─┐ 4 ...
G---A---T C ...
BED └─0 1 2 └─3 ...
. | gff -> bed | bed -> gff |
---|---|---|
new_start = | gff_start - 1 | bed_start + 1 |
new_end = | gff_end | bed_end |
Use intervals to extract sequences from a FASTA file. (Docs)
FASTA ACTGATCATGATACATGATACCATTAGGATACAATA
BED ████ █████ ████
OUTFA ATCA TGATA GGAT
$ bedtools getfasta [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF>
OPTIONS | . |
---|---|
-name | Use “name” column in BED file for FASTA headers in the output. |
-s | Reverse complement features on "-" strand. Default: strand information ignored. |
-split | Given BED12 input, concatenate the sequences from BED blocks (e.g., exons). |
Use intervals to mask sequences from a FASTA file. (Docs)
FASTA ACTGATCATGATACATGATACCATTAGGATACAATA
BED ████ █████ ████
FASTA' ACTGATNNNNATACATGNNNNNATTAGGNNNNAATA
$ bedtools maskfasta [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF> -fo <output FASTA>
OPTIONS | . |
---|---|
-soft | Soft-mask (convert to lower-case bases) instead of masking with "N". |
-mc | Specify masking character. |
Order the intervals in a file. (Docs)
$ bedtools sort [OPTIONS] -i <BED/GFF/VCF>
OPTIONS | . |
---|---|
-sizeA | Sort by feature size (asc). |
-sizeD | Sort by feature size (desc). |
-chrThenSizeA | Sort by chromosome (asc), then by feature size (asc). |
-chrThenSizeD | Sort by chromosome (asc), then by feature size (desc). |
-chrThenScoreA | Sort by chromosome (asc), then by score (asc). |
-chrThenScoreD | Sort by chromosome (asc), then by score (desc). |
Calculate the Jaccard statistic b/w two sets of intervals. (Docs)
A ███████████ 15bp
B ▓▓▓▓ 10bp ▓▓ 4bp ▓▓▓ 8bp
A int B ▓▓ 6bp ▓▓ 4bp
Jaccard(A,B) (6+4)/((15+10+4+8)-(6+4)) = 0.37
$ bedtools jaccard [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
OPTIONS | . |
---|---|
common | strandedness: -s, -S; overlap: -f, -F; overlap mode: -r, -e; bed12: -split |
Generate random intervals in a genome. (Docs)
$ bedtools random [OPTIONS] -g <GENOME>
OPTIONS | . |
---|---|
-l | The length of the intervals to generate. Default: 100 |
-n | The number of intervals to generate. Default: 1,000,000 |
-seed | Supply an integer seed for the shuffling. |
Calculate the distribution of relative distances b/w two files. (Docs)
───────r──────
A ▓▓▓▓▓▓ ▓▓▓▓
B ███
───d1─── ──d2──
reldist = min(d1,d2)/r
$ bedtools reldist [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
OPTIONS | . |
---|---|
-detail | Instead of a summary, report relative distance for each region in A. |
Randomly redistribute intervals in a genome. (Docs)
$ bedtools shuffle [OPTIONS] -i <BED/GFF/VCF> -g <GENOME>
OPTIONS | . |
---|---|
-excl | BED file with regions into which features won't be shuffled. |
-incl | BED file with regions into which features will be shuffled. |
-chrom | Keep features on the same chromosome. |
-chromFirst | Distribute features ~uniformly across chroms, not across total sequence. |
-noOverlapping | Don't allow shuffled intervals to overlap. |
Makes adjacent or sliding windows across a genome or BED file.
$bedtools makewindows [OPTIONS] [-g <GENOME>|-b <BED>] [-w <window size> | -n <n windows>]
OPTIONS | . |
---|---|
-s | Number of bases to step before creating a new window. Default: equal to -w |
Annotate coverage of features from multiple files. (Docs)
$ bedtools annotate -i variants.bed -files genes.bed conserve.bed known_var.bed
chr1 100 200 nasty 1 - 0.500000 1.000000 0.300000
chr2 500 1000 ugly 2 + 0.000000 0.600000 1.000000
$ bedtools annotate [OPTIONS] -i <BED/GFF/VCF> -files FILE1 FILE2 FILE3 ... FILEn
OPTIONS | . |
---|---|
-counts | Report count of features that overlap -i in each file. Default: report fraction of -i covered by each file. |
-both | Report counts & fractions for each file. |
common | strandedness: -s, -S. |
Compute the coverage over defined intervals. (Docs)
BED FILE A ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓
BED File B ████ ████ ██ █████████
████████
Result [ N=3, 10/15 ] [ N=1, 2/15 ] [N=1,6/6]
$ bedtools coverage [OPTIONS] -a <BAM/BED/GFF/VCF> -b <FILE1, FILE2, ..., FILEN>
OPTIONS | . |
---|---|
-d | Report the depth at each position in each A feature. |
common | strandedness: -s, -S; overlap: -f, -F; overlap mode: -r, -e; bam/bed12: -split,-abam |