Skip to content

Instantly share code, notes, and snippets.

@ckandoth
Last active November 7, 2023 14:32
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save ckandoth/61c65ba96b011f286220fa4832ad2bc0 to your computer and use it in GitHub Desktop.
Save ckandoth/61c65ba96b011f286220fa4832ad2bc0 to your computer and use it in GitHub Desktop.
Install Ensembl's VEP v102 with local cache for running offline

Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.

Instead of the official instructions, we will use conda to install VEP and its dependencies. If you don't already have conda, install it into $HOME/miniconda3 as follows:

curl -sL https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh -o /tmp/miniconda.sh
sh /tmp/miniconda.sh -bfp $HOME/miniconda3

Add the conda bin folder into your $PATH so that all installed tools are accessible via command-line. You can also add this to your ~/.bashrc or ~/.profile for this to persist across logins:

export PATH=$HOME/miniconda3/bin:$PATH

Download and install VEP, its dependencies, and also samtools/bcftools/liftOver:

conda install -qy -c conda-forge -c bioconda -c defaults ensembl-vep==102.0 htslib==1.10.2 bcftools==1.10.2 samtools==1.10 ucsc-liftover==377

Download VEP's offline cache for GRCh38, and the reference FASTA:

mkdir -p $HOME/.vep/homo_sapiens/102_GRCh38/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-102/variation/indexed_vep_cache/homo_sapiens_vep_102_GRCh38.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_102_GRCh38.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-102/fasta/homo_sapiens/dna_index/ $HOME/.vep/homo_sapiens/102_GRCh38/

(Optional) Download VEP's offline cache for GRCh37, and the reference FASTA which we must bgzip instead of gzip:

mkdir -p $HOME/.vep/homo_sapiens/102_GRCh37/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-102/variation/indexed_vep_cache/homo_sapiens_vep_102_GRCh37.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_102_GRCh37.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/grch37/release-102/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz $HOME/.vep/homo_sapiens/102_GRCh37/
gzip -d $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
bgzip -i $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
samtools faidx $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz

Test running VEP in offline mode on a GRCh38 VCF:

curl -sLO https://raw.githubusercontent.com/Ensembl/ensembl-vep/release/102/examples/homo_sapiens_GRCh38.vcf
vep --species homo_sapiens --assembly GRCh38 --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $HOME/.vep --fasta $HOME/.vep/homo_sapiens/102_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --input_file homo_sapiens_GRCh38.vcf --output_file homo_sapiens_GRCh38.vep.vcf --polyphen b --af --af_1kg --af_esp --regulatory
@sshenoy-mdsol
Copy link

Ensembl ftp links did not function as written for me.

https://gist.github.com/sshenoy-mdsol/ad9e248a6a9dbf5c979b3ba005a3b719

@ckandoth
Copy link
Author

ckandoth commented Nov 7, 2020

Ensembl ftp links did not function as written for me.

https://gist.github.com/sshenoy-mdsol/ad9e248a6a9dbf5c979b3ba005a3b719

@sshenoy-mdsol When using rsync with Ensembl FTP links, they need us to tweak the URL as documented here - https://m.ensembl.org/info/data/ftp/rsync.html

@caaespin
Copy link

@ckandoth where would the equivalent files for GRCH37 be located at? I can't locate the required files by simply changing the 8 to 7. Maybe I'm doing something wrong?

@ckandoth
Copy link
Author

@caaespin see updated gist with optional steps for GRCh37.

@whtns
Copy link

whtns commented Feb 4, 2021

Thank you for this outline. Would it be possible to add a link to mskcc/vcf2maf#97 or similar issues in the section on GRCh37? I struggled for quite a while to find an explanation.

@kvn95ss
Copy link

kvn95ss commented Mar 24, 2021

Hay, my workplace has no access to FTP sites. How can I still download the Cache files?

@lincj1994
Copy link

Hi. When I run samtools faidx $HOME/.vep/homo_sapiens/104_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz, it returned an error.
samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

@fw1121
Copy link

fw1121 commented Jul 25, 2021

@lincj1994 This is may a issue with conda env, my solution is install samtools/bcftools as standalone software besides conda env, things run OK.

@zjiang-lji
Copy link

zjiang-lji commented Aug 3, 2021

Using conda to install VEP and its dependencies is not working. Please help me understand and solve this error:

$ conda install -qy -c conda-forge -c bioconda -c defaults ensembl-vep==102.0

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                                                                                                                                                                            

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package openssl conflicts for:
ensembl-vep==102.0 -> htslib -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1g,<1.1.2a|>=1.1.1h,<1.1.2a|>=1.1.1i,<1.1.2a|>=1.1.1j,<1.1.2a|>=1.1.1k,<1.1.2a']
conda-forge/linux-64::python==3.9.6=h49503c6_0_cpython -> openssl[version='>=1.1.1k,<1.1.2a']The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.17=0
  - feature:|@/linux-64::__glibc==2.17=0
  - conda-forge/linux-64::python==3.9.6=h49503c6_0_cpython -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.17

@AgustinRamiroDiaz
Copy link

Great post! I was having issues with the rsync ftp and found out that the route is wrong I don't know if they changed it, but the path is without the "ensembl" (rsync://ftp.ensembl.org/ensembl/pub => rsync://ftp.ensembl.org/pub). So the code would look like this

Download VEP's offline cache for GRCh38, and the reference FASTA:

mkdir -p $HOME/.vep/homo_sapiens/102_GRCh38/
rsync -avr --progress rsync://ftp.ensembl.org/pub/release-102/variation/indexed_vep_cache/homo_sapiens_vep_102_GRCh38.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_102_GRCh38.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/pub/release-102/fasta/homo_sapiens/dna_index/ $HOME/.vep/homo_sapiens/102_GRCh38/

(Optional) Download VEP's offline cache for GRCh37, and the reference FASTA which we must bgzip instead of gzip:

mkdir -p $HOME/.vep/homo_sapiens/102_GRCh37/
rsync -avr --progress rsync://ftp.ensembl.org/pub/release-102/variation/indexed_vep_cache/homo_sapiens_vep_102_GRCh37.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_102_GRCh37.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/pub/grch37/release-102/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz $HOME/.vep/homo_sapiens/102_GRCh37/
gzip -d $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
bgzip -i $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
samtools faidx $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz

Please update it because this gist is a great resource and it would be great for everyone who uses it in the future

Best wishes!

@elenips
Copy link

elenips commented Feb 26, 2022

Hi,

i am trying to install the vep using conda.When I am running this command :

sh /tmp/miniconda.sh -bfp $HOME/miniconda3

I am getting this error :

sh: 0: Can't open /tmp/miniconda.sh

If you could help me solve my problem , i would be more than thankful!

@MingMingRaoHandsome
Copy link

perl: symbol lookup error: ~/perl5/lib/perl5/x86_64-linux-thread-multi/auto/DBI/DBI.so: undefined symbol: Perl_xs_apiversion_bootcheck

@gauri-nagavkar
Copy link

Using conda to install VEP and its dependencies is not working. Please help me understand and solve this error:

$ conda install -qy -c conda-forge -c bioconda -c defaults ensembl-vep==102.0

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                                                                                                                                                                            

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package openssl conflicts for:
ensembl-vep==102.0 -> htslib -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1g,<1.1.2a|>=1.1.1h,<1.1.2a|>=1.1.1i,<1.1.2a|>=1.1.1j,<1.1.2a|>=1.1.1k,<1.1.2a']
conda-forge/linux-64::python==3.9.6=h49503c6_0_cpython -> openssl[version='>=1.1.1k,<1.1.2a']The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.17=0
  - feature:|@/linux-64::__glibc==2.17=0
  - conda-forge/linux-64::python==3.9.6=h49503c6_0_cpython -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.17

@zjiang-lji were you able to solve this? I'm running into the same error and cannot figure it out

@ckandoth
Copy link
Author

@zjiang-lji and @gauri-nagavkar - your errors indicate you are using conda with Python 3.9 which needs a newer version of openssl than VEP supports. Note that my instructions recommend installing conda with Python 3.7.

@ndfriedman
Copy link

ndfriedman commented May 1, 2022

@zjiang-lji and @gauri-nagavkar - your errors indicate you are using conda with Python 3.9 which needs a newer version of openssl than VEP supports. Note that my instructions recommend installing conda with Python 3.7.

Like @zjiang-lji and @gauri-nagavkar I was stuck on an error with conda install for a very long time, but eventually fixed it. Here's what I did: Environment: ubuntu with python 3.9, anaconda already installed. I created a conda environment with conda create —name annotationEnv python=3.7 . Then I activated the environment with conda activate annotationEnv. Then within the environment I followed the gist verbatim including downloading and installing miniconda exporting it to be my path, conda install etc. It seems like a hack solution, but lots of answers online suggested anacondas within anacondas as the way forward with these sorts of errors

@harish0201
Copy link

Always create a new environment for tools with too many dependencies. @gauri-nagavkar

@KitHub-NK
Copy link

In 2022 (as of today) the links that work for Grch37 (at least for me) are:

rsync -avr --progress rsync://ftp.ebi.ac.uk/ensemblorg/pub/release-102/variation/indexed_vep_cache/homo_sapiens_vep_102_GRCh37.tar.gz $HOME/.vep/

rsync -avr --progress rsync://ftp.ebi.ac.uk/ensemblorg/pub/grch37/release-102/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz $HOME/.vep/homo_sapiens/102_GRCh37/

Hope this helps.

@jrhaulung
Copy link

What would be the right version of python for installation ensembl-vep==108.1
Using individual env with python 3.7 or 3.8 all return

Package zlib conflicts for:
python=3.7 -> zlib[version='>=1.2.11,<1.3.0a0|>=1.2.12,<1.3.0a0|>=1.2.13,<1.3.0a0']
ensembl-vep==108.0 -> htslib -> zlib[version='1.2.11.*|>=1.2.11,<1.3.0a0|>=1.2.12,<1.3.0a0']The following specifications were found to be incompatible with your system:

  • feature:/linux-64::__glibc==2.31=0
  • python=3.7 -> libgcc-ng[version='>=11.2.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.31

@YeHW
Copy link

YeHW commented Nov 30, 2022

Thanks for the guide.
For people who might need a newer version of ensembl-vep, I suggest an alternative installation procedure that uses mamba.

  1. Create a new env for ensembl-bep
    mamba create -n vep
  2. I'd try installing without specifying the version to let mamba figure out the compatible versions among all the tools (then fix version later)
    mamba install -c conda-forge -c bioconda -c defaults ensembl-vep htslib bcftools samtools ucsc-liftover
  3. Look for data in interest at this FTP site then download to local, please refer to the original post

@kellyduarte
Copy link

For test running VEP in offline mode on a GRCh37 VCF (updated - 2023):

wget https://raw.githubusercontent.com/Ensembl/ensembl-vep/release/109/examples/homo_sapiens_GRCh37.vcf

The examples are on this site, but it is necessary to copy the raw data from the file
https://github.com/Ensembl/ensembl-vep/blob/release/109/examples/homo_sapiens_GRCh38.vcf

@kellyduarte
Copy link

To work with samtools, the version needs to be updated by conda, currently, it is at version 1.17.

Use the link to install the latest version:
https://anaconda.org/bioconda/samtools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment