crazyhottommy/install_VEP.md

## install_VEP.md

      
    Raw
  

              install_VEP.md
            
          
    Install

The latest version of vep is on github
http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html#installer
it is version 89 when this gist was written.(bioinformatics tools evolve too fast!)
check this gist as well https://gist.github.com/ckandoth/f265ea7c59a880e28b1e533a6e935697
cd /scratch/genomic_med/apps
git clone https://github.com/Ensembl/ensembl-vep.git
cd ensembl-vep
git status
# the Ensembl API will be installed
perl INSTALL.pl
export VEP_DATA="/scratch/genomic_med/apps/ensembl-vep-data"
export VEP_PATH="/scratch/genomic_med/apps/ensembl-vep"

rsync -avhP rsync://ftp.ensembl.org/ensembl/pub/release-89/variation/VEP/homo_sapiens_vep_89_GRCh37.tar.gz $VEP_DATA
tar -xvzf $VEP_DATA/homo_sapiens_vep_89_GRCh37.tar.gz -C $VEP_DATA
install the reference FASTAs for GRCh37:
a fasta file $VEP_DATA/homo_sapiens/89_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz will be downloaded.
perl INSTALL.pl --AUTO f --SPECIES homo_sapiens --ASSEMBLY GRCh37 --DESTDIR $VEP_PATH --CACHEDIR $VEP_DATA

Convert the offline cache for use with tabix, that significantly speeds up the lookup of known variants:
perl convert_cache.pl --species homo_sapiens --version 89_GRCh37 --dir $VEP_DATA
Annotate

vep --species homo_sapiens --assembly GRCh37 --offline --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --uniprot --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $VEP_DATA --fasta $VEP_DATA/homo_sapiens/89_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz --input_file example_GRCh37.vcf --output_file example_GRCh37.vep.vcf --polyphen b --af_1kg --af_esp --regulatory 
If you got an error message:
-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot index bgzipped FASTA file with Bio::DB::Fasta

most likely, you perl version is too old.
see an issue here
options:


You can update your systems perl the suggested version (which was not suitable in my case)
Or one can install a local version of the correct perl version. see here
As the error is caused by reading a gziped file, one can simply unzip the reference.


Make sure you have write access to the folder where the fasta file resides. I am placing the fasta in our department shared folder.
from http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html

The first time you run the script with this parameter an index will be built which can take a few minutes. This is required if fetching HGVS annotations (--hgvs) or checking reference sequences (--check_ref) in offline mode (--offline).