Skip to content

Instantly share code, notes, and snippets.

@darencard
Created July 17, 2017 16:16
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save darencard/1f7270dba015a1afa89eb7dd762f161e to your computer and use it in GitHub Desktop.
Save darencard/1f7270dba015a1afa89eb7dd762f161e to your computer and use it in GitHub Desktop.
Calculating population genetic statistics from VCF files using BCFtools

Useful Oneliners for Calculating Population Genetic Statistics from VCF files

The following commands require non-standard software like BCFtools and VCFtools.

thin variants to prevent linkage biases and output the number of sampled alleles and the allele frequency for the reference allele

vcftools --thin 10000 --recode --recode-INFO-all --stdout --gzvcf <my_variants.vcf.gz> | \
  bcftools query -f '%CHROM\t%POS[\t%GT]\n' - | \
  awk -v OFS="\t" '{ miss=0; hom_ref=0; hom_alt=0; het=0; \
    for (i=3; i<=NF; i++) \
    if ($i == "./.") miss += 1; \
    else if ($i == "0/0") hom_ref += 1; \
    else if ($i == "1/1") hom_alt += 1; \
    else if ($i == "0/1" || $i == "1/0") het += 1; \
    pres=hom_ref+hom_alt+het; \
    print $1, $2, pres*2, ((2*hom_ref)+het)/(2*pres) }'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment