Skip to content

Instantly share code, notes, and snippets.

@seandavi
Last active December 28, 2023 02:46
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save seandavi/95a4b2ab3b90f6f0bfd9 to your computer and use it in GitHub Desktop.
Save seandavi/95a4b2ab3b90f6f0bfd9 to your computer and use it in GitHub Desktop.
snpEff on the NIH Biowulf cluster

Usage

To use these scripts:

  • Clone this repository: git clone https://gist.github.com/95a4b2ab3b90f6f0bfd9.git snpEffScript
  • cd snpEffScript
  • make appropriate changes to setup.sh
  • call snpEff.sh like so:
./snpEff.sh input.vcf output.vcf

If you want to use this in a stream, you may use /dev/stdin for the input.vcf filename and /dev/stdout for output.vcf.

Comments are welcome. You can add further information relatively easily by simply creating a VCF file with the right INFO fields and snpSift annotate can add that information to the vcf file.

Make a text file from the VCF file

I have created a set of small tools for working with sequence data here: https://github.com/seandavi/seqtools. The docs are here: https://seqtools.readthedocs.org/en/latest/. The tool has a vcf-to-text converter, seqtool vcf melt. An example is given here: https://seqtools.readthedocs.org/en/latest/seqtool.html#melt-a-vcf-file-to-tab-delimited-text

#!/usr/bin/env bash
# Sean Davis
# January, 2014
# This file is not called by the user
# Instead, it is used to set up the environment
# for snpEff.sh.
# Baseline data
GENOMEFASTA=/data/CCRBioinfo/public/GATK/bundles/2.3/hg19/ucsc.hg19.fasta
#Annotation
CLINVARVCF=/data/ngs/public/clinvar/clinvar_20130905.vcf
DBSNPVCF=/data/ngs/public/dbsnp/00-All.vcf
COSMICVCF=/data/ngs/public/COSMIC/CosmicCodingMuts_v66_20130725.vcf
GWASCATALOG=/data/ngs/public/GWASCatalog/gwascatalog.txt
DBNSFP=/data/ngs/public/dbNSFP/dbNSFP2.0.txt
MSIGDB=/data/ngs/public/msigdb/msigdb.v4.0.symbols.gmt
# SNPEFF
SNPEFFHOME=/data/ngs/usr/local/src/snpEff/3.4
SNPEFFGENOME=GRCh37.74
#!/usr/bin/env bash
# Sean Davis
# January, 2014
# $1 is the input VCF file
# $2 is the output VCF file
#
# NOTE: the ordering of the last two steps is important.
# Other step orderings are not important.
# Grab the important environment variables
# ### Change the path of the next line to ###
# ### match where you put setup.sh ###
source setup.sh
# and the actual annotation, piped
java -Xmx1g -jar $SNPEFFHOME/SnpSift.jar dbnsfp -f 'aaref,aaalt,Uniprot_acc,Uniprot_id,Uniprot_aapos,Interpro_domain,cds_strand,refcodon,codonp
os,fold-degenerate,Ancestral_allele,Ensembl_geneid,Ensembl_transcriptid,aapos,SIFT_score,Polyphen2_HDIV_pred,Polyphen2_HVAR_pred,LRT_score,LRT_pre
d,MutationTaster_score,MutationTaster_pred,MutationAssessor_score,MutationAssessor_pred,FATHMM_score,GERP++_NR,GERP++_RS,phyloP,29way_pi,29way_log
Odds,LRT_Omega,UniSNP_ids,1000Gp1_AC,1000Gp1_AF,1000Gp1_AFR_AC,1000Gp1_AFR_AF,1000Gp1_EUR_AC,1000Gp1_EUR_AF,1000Gp1_AMR_AC,1000Gp1_AMR_AF,1000Gp1_
ASN_AC,1000Gp1_ASN_AF,ESP6500_AA_AF,ESP6500_EA_AF' $DBNSFP $1 | \
java -Xmx1g -jar $SNPEFFHOME/SnpSift.jar annotate $COSMICVCF /dev/stdin | \
java -Xmx1g -jar $SNPEFFHOME/SnpSift.jar annotate $DBSNPVCF /dev/stdin | \
java -Xmx1g -jar $SNPEFFHOME/SnpSift.jar annotate $CLINVARVCF /dev/stdin | \
java -Xmx1g -jar $SNPEFFHOME/SnpSift.jar gwasCat $GWASCATALOG /dev/stdin | \
java -Xmx6g -jar $SNPEFFHOME/snpEff.jar -c $SNPEFFHOME/snpEff.config $SNPEFFGENOME | \
java -Xmx1g -jar $SNPEFFHOME/SnpSift.jar geneSets -v $MSIGDB /dev/stdin > $2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment