Skip to content

Instantly share code, notes, and snippets.

@xiaojay
Created July 12, 2016 14:49
Show Gist options
  • Save xiaojay/191c83aa62eec04aef32e1ab9c0498e7 to your computer and use it in GitHub Desktop.
Save xiaojay/191c83aa62eec04aef32e1ab9c0498e7 to your computer and use it in GitHub Desktop.
stats in chinese from 1000 genome
#coding=utf-8
import sys,json,re
import vcf
vcf_reader = vcf.Reader(open(sys.argv[1]))
ch = json.load(open('chb.json')) + json.load(open('chs.json'))
pattern = r'^([0,1])[\|\/]([0,1])$'
for record in vcf_reader:
if record.var_type == 'snp' and len(record.ALT) == 1:
stats = {0:0, 1:0, 2:0}
for s in record.samples:
if s.sample in ch and re.match(pattern, s.gt_nums):
g = sum([int(i) for i in re.match(pattern, s.gt_nums).groups()])
stats[g] = stats[g] + 1
print record.ID, record.REF, record.ALT[0], stats[0], stats[1], stats[2]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment