Skip to content

Instantly share code, notes, and snippets.

@brentp
Created March 11, 2015 21:00
Show Gist options
  • Save brentp/36f23caa13fd481edbc6 to your computer and use it in GitHub Desktop.
Save brentp/36f23caa13fd481edbc6 to your computer and use it in GitHub Desktop.
compare different variant normalization methods
set -eo pipefail
REF=human_g1k_v37_decoy.fasta
VCF=Mills_and_1000G_gold_standard.indels.b37.vcf.gz
VCF=/usr/local/src/gemini/test/test.mult-alts.fb.vep.vcf.gz
java -Xmx4G -jar /usr/local/src/gatk/GenomeAnalysisTK.jar -T LeftAlignAndTrimVariants --splitMultiallelics --trimAlleles \
-R $REF \
--variant $VCF -o gatk.norm.vcf
vt decompose $VCF | vt normalize -r $REF - > vt.norm.vcf
python /usr/local/src/gemini/gemini/vcfutils.py $VCF $REF > gemini.norm.vcf
grep -cv '#' *.norm.vcf
# all come out sorted in this case.
for v in *.norm.vcf; do bgzip -f $v && tabix ${v}.gz; done
paste <(zless vt.norm.vcf.gz | grep -v '^#' | cut -f -5 | sort -k2,2n -k4,5) \
<(zless gemini.norm.vcf.gz | grep -v '^#' | cut -f -5 | sort -k2,2n -k4,5) \
| awk '$4 != $9 || $5 != $10 || $2 != $7' | wc -l
paste <(zless gatk.norm.vcf.gz | grep -v '^#' | cut -f -5 | sort -k2,2n -k4,5) \
<(zless gemini.norm.vcf.gz | grep -v '^#' | cut -f -5 | sort -k2,2n -k4,5) \
| awk '$4 != $9 || $5 != $10 || $2 != $7' | wc -l
# compare the genotype cols
paste <(zless vt.norm.vcf.gz | grep -v '^#' | sort -k2,2n -k4,5 | cut -f 9-11) \
<(zless gatk.norm.vcf.gz | grep -v '^#' | sort -k2,2n -k4,5 | cut -f 9-11) \
<(zless gemini.norm.vcf.gz | grep -v '^#' | sort -k2,2n -k4,5 | cut -f 9-11) \
| less
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment