Skip to content

Instantly share code, notes, and snippets.

@crazyhottommy
Last active June 28, 2017 15:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save crazyhottommy/768a92c206f758f83be7e4912a36f43d to your computer and use it in GitHub Desktop.
Save crazyhottommy/768a92c206f758f83be7e4912a36f43d to your computer and use it in GitHub Desktop.

Install

The latest version of vep is on github http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html#installer

it is version 89 when this gist was written.(bioinformatics tools evolve too fast!)

check this gist as well https://gist.github.com/ckandoth/f265ea7c59a880e28b1e533a6e935697

cd /scratch/genomic_med/apps
git clone https://github.com/Ensembl/ensembl-vep.git
cd ensembl-vep
git status
# the Ensembl API will be installed
perl INSTALL.pl
export VEP_DATA="/scratch/genomic_med/apps/ensembl-vep-data"
export VEP_PATH="/scratch/genomic_med/apps/ensembl-vep"

rsync -avhP rsync://ftp.ensembl.org/ensembl/pub/release-89/variation/VEP/homo_sapiens_vep_89_GRCh37.tar.gz $VEP_DATA
tar -xvzf $VEP_DATA/homo_sapiens_vep_89_GRCh37.tar.gz -C $VEP_DATA

install the reference FASTAs for GRCh37:

a fasta file $VEP_DATA/homo_sapiens/89_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz will be downloaded.

perl INSTALL.pl --AUTO f --SPECIES homo_sapiens --ASSEMBLY GRCh37 --DESTDIR $VEP_PATH --CACHEDIR $VEP_DATA

Convert the offline cache for use with tabix, that significantly speeds up the lookup of known variants:

perl convert_cache.pl --species homo_sapiens --version 89_GRCh37 --dir $VEP_DATA

Annotate

vep --species homo_sapiens --assembly GRCh37 --offline --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --uniprot --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $VEP_DATA --fasta $VEP_DATA/homo_sapiens/89_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz --input_file example_GRCh37.vcf --output_file example_GRCh37.vep.vcf --polyphen b --af_1kg --af_esp --regulatory 

If you got an error message:

-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot index bgzipped FASTA file with Bio::DB::Fasta

most likely, you perl version is too old. see an issue here

options:

  1. You can update your systems perl the suggested version (which was not suitable in my case)
  2. Or one can install a local version of the correct perl version. see here
  3. As the error is caused by reading a gziped file, one can simply unzip the reference.

Make sure you have write access to the folder where the fasta file resides. I am placing the fasta in our department shared folder.

from http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html

The first time you run the script with this parameter an index will be built which can take a few minutes. This is required if fetching HGVS annotations (--hgvs) or checking reference sequences (--check_ref) in offline mode (--offline).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment