Skip to content

Instantly share code, notes, and snippets.

@darencard
Last active January 12, 2017 16:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save darencard/a91e2bedf9f81296d201cfb612e0337d to your computer and use it in GitHub Desktop.
Save darencard/a91e2bedf9f81296d201cfb612e0337d to your computer and use it in GitHub Desktop.
Extract the proportion of missing data per sample in a VCF/BCF file

Simply replace <<FILE>> with your properly formated VCF/BCF file name (2 places). Required bcftools v. 1.2+.

paste \
<(bcftools query -f '[%SAMPLE\t]\n' <<FILE>> | head -1 | tr '\t' '\n') \
<(bcftools query -f '[%GT\t]\n' <<FILE>> | awk -v OFS="\t" '{for (i=1;i<=NF;i++) if ($i == "./.") sum[i]+=1 } END {for (i in sum) print i, sum[i] / NR }' | sort -k1,1n | cut -f 2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment