Skip to content

Instantly share code, notes, and snippets.

@ckandoth
Created April 12, 2022 20:34
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ckandoth/d6de7eff889e8860dd5f3f3dd234c045 to your computer and use it in GitHub Desktop.
Save ckandoth/d6de7eff889e8860dd5f3f3dd234c045 to your computer and use it in GitHub Desktop.
Install Ensembl's VEP v106 with local cache for running offline

Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.

Instead of the official instructions, we will use mamba (conda, but faster) to install VEP and its dependencies. If you don't already have mamba, use these steps to download and install it into $HOME/mambaforge, then run a script that adds it to your $PATH:

curl -L https://github.com/conda-forge/miniforge/releases/download/4.12.0-0/Mambaforge-Linux-x86_64.sh -o /tmp/mambaforge.sh
sh /tmp/mambaforge.sh -bfp $HOME/mambaforge && rm -f mambaforge.sh
. $HOME/mambaforge/etc/profile.d/conda.sh

You can add the following to your ~/.bashrc file to add mamba and conda to your $PATH whenever you login:

if [ -f "$HOME/mambaforge/etc/profile.d/conda.sh" ]; then
    . $HOME/mambaforge/etc/profile.d/conda.sh
fi

Use mamba to create and activate a conda environment with VEP, its dependencies, and other related tools:

mamba create -n vep
conda activate vep
mamba install -y -c conda-forge -c bioconda -c defaults ensembl-vep==106.0 htslib==1.14 bcftools==1.14 samtools==1.14 ucsc-liftover==377

Download VEP's offline cache for GRCh38, and the reference FASTA:

mkdir -p $HOME/.vep/homo_sapiens/106_GRCh38/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-106/variation/indexed_vep_cache/homo_sapiens_vep_106_GRCh38.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_106_GRCh38.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-106/fasta/homo_sapiens/dna_index/ $HOME/.vep/homo_sapiens/106_GRCh38/

(Optional) Download VEP's offline cache for GRCh37, and the reference FASTA which we must bgzip instead of gzip:

mkdir -p $HOME/.vep/homo_sapiens/106_GRCh37/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-106/variation/indexed_vep_cache/homo_sapiens_vep_106_GRCh37.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_106_GRCh37.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/grch37/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz $HOME/.vep/homo_sapiens/106_GRCh37/
gzip -d $HOME/.vep/homo_sapiens/106_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
bgzip -i $HOME/.vep/homo_sapiens/106_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
samtools faidx $HOME/.vep/homo_sapiens/106_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz

Test running VEP in offline mode on a GRCh38 VCF:

curl -sLO https://raw.githubusercontent.com/Ensembl/ensembl-vep/release/106/examples/homo_sapiens_GRCh38.vcf
vep --species homo_sapiens --assembly GRCh38 --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $HOME/.vep --fasta $HOME/.vep/homo_sapiens/106_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --input_file homo_sapiens_GRCh38.vcf --output_file homo_sapiens_GRCh38.vep.vcf --polyphen b --af --af_1kg --af_esp --regulatory
@j-hudecek
Copy link

Not sure why but mamba installed v107 of VEP instead so to use the cache I had to specify --cache_version 106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment