Skip to content

Instantly share code, notes, and snippets.

@RandyHarr
Last active February 26, 2022 20:43
Show Gist options
  • Save RandyHarr/2cbb8901306891e810d698ab27d212ce to your computer and use it in GitHub Desktop.
Save RandyHarr/2cbb8901306891e810d698ab27d212ce to your computer and use it in GitHub Desktop.
Shell script around fixFTDNAvcf.py script to fix FTDNA BigY VCF and then annotate with yBrowse SNP names and haplogroups
#!/bin/bash
#
# Fixes FTDNA VCF file so can be processed by standard tools following the VCF standard
# Annotates the FTDNA BigY VCF file with the latest yBrowse DB entries for SNP names, yFull and ISOGG HG
#
# This is all handled behind the scenes (automagically) by WGS Extract (in the next release)
# Simply a stand-alone. simple scenario script installation for demonstration purposes here
#
# Relies on htslib bgzip and bcftools; along with wget, python rm, zip and unzip.
# Relies on access to yBrowse DB file and WGS Extract python utility fixFTDNAvcf.py
shopt -s nullglob
if [ $# -ne 1 ] ; then
echo "Usage: $0 FTDNA_variant_zip_file or variants.vcf"
exit
elif [ ! -f "$1" ] ; then
echo "$1 is not a file"
exit
elif [ "$1" -ne "variants.vcf" ]; then
unzip "$1" variants.vcf
fi
file="variants.vcf"
# Merge similar lines in yBrowse source file
wget https://ybrowse.org/gbrowse2/gff/snps_hg38.vcf.gz
bcftools norm -m +any -Oz -o snps_hg38m.vcf.gz snps_hg38.vcf.gz
bcftools index snps_hg38m.vcf.gz
# Annotate the FTDNA VCF file
wget https://github.com/WGSExtract/WGSExtract-Dev/blob/master/program/fixFTDNAvcf.py
python fixFTDNAvcf.py < "$file" > variants_n.vcf
bgzip variants_n.vcf
bcftools index variants_n.vcf.gz
bcftools annotate -a snps_hg38m.vcf.gz -c +ID,INFO/HG,INFO/ISOGG -Oz -o variants_a.vcf variants_n.vcf.gz
mv -f variants_a.vcf $file
if [ "$1" -ne "$file" ]; then
zip -f "$1" $file
fi
rm snps_hg38m.vcf* snps_hg38.vcf* variants.vcf variants_n* # fixFTDNAvcf.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment