Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save sshenoy-mdsol/ad9e248a6a9dbf5c979b3ba005a3b719 to your computer and use it in GitHub Desktop.
Save sshenoy-mdsol/ad9e248a6a9dbf5c979b3ba005a3b719 to your computer and use it in GitHub Desktop.
Install Ensembl's VEP v100 with local cache for running offline

Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.

Instead of the official instructions, we will use conda to install VEP and its dependencies. If you don't already have conda, install it into $HOME/miniconda3 as follows:

curl -sL https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh -o /tmp/miniconda.sh
sh /tmp/miniconda.sh -bfp $HOME/miniconda3

Add the conda bin folder into your $PATH so that all installed tools are accessible via command-line. You can also add this to your ~/.bashrc or ~/.profile for this to persist across logins:

export PATH=$HOME/miniconda3/bin:$PATH

Download and install VEP, its dependencies, and also samtools/bcftools/liftOver:

conda install -qy -c conda-forge -c bioconda -c defaults ensembl-vep==100.3 samtools==1.9 bcftools==1.9 ucsc-liftover==377

Download VEP's offline cache for GRCh38, and the reference FASTA:

vep_install --AUTO cf --SPECIES homo_sapiens --ASSEMBLY GRCh38 --CACHEDIR $HOME/.vep

The command above is prone to networking issues. If it fails, manually download the GRCh38 cache and FASTA as follows:

rsync -av rsync://ftp.ensembl.org/pub/release-100/variation/indexed_vep_cache/homo_sapiens_vep_100_GRCh38.tar.gz $HOME/.vep
tar -zxf $HOME/.vep/homo_sapiens_vep_100_GRCh38.tar.gz -C $HOME/.vep
rsync -avr rsync://ftp.ensembl.org/pub/release-100/fasta/homo_sapiens/dna_index/ $HOME/.vep/homo_sapiens/100_GRCh38/

Test running VEP in offline mode on a GRCh38 VCF:

curl -sLO https://raw.githubusercontent.com/Ensembl/ensembl-vep/release/100/examples/homo_sapiens_GRCh38.vcf
vep --species homo_sapiens --assembly GRCh38 --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $HOME/.vep --fasta $HOME/.vep/homo_sapiens/100_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --input_file homo_sapiens_GRCh38.vcf --output_file homo_sapiens_GRCh38.vep.vcf --polyphen b --af --af_1kg --af_esp --regulatory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment