Created
October 19, 2011 08:49
-
-
Save jamescasbon/1297761 to your computer and use it in GitHub Desktop.
Melt VCF files into a reasonable tab delimited format, one genotype per line
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
""" Melt a VCF file into a tab delimited set of calls, one per line | |
VCF files have all the calls from different samples on one line. This | |
script reads vcf on stdin and writes all calls to stdout in tab delimited | |
format with one call in one sample per line. This makes it easy to find | |
a given sample's genotype with, say, grep. | |
""" | |
import sys | |
import csv | |
out = csv.writer(sys.stdout, delimiter='\t') | |
fixed = None | |
samples = None | |
format = None | |
def parse_samples(line): | |
toks = line[1:].split('\t') | |
fixed, samples = toks[:9], toks[9:] | |
return fixed, samples | |
def parse_format(line): | |
return line.split('\t')[8].split(':') | |
def parse_variant(line): | |
return line.split('\t')[:8] | |
def parse_calls(line): | |
for call in line.split('\t')[9:]: | |
yield call.split(':') | |
for line in sys.stdin: | |
line = line.rstrip() | |
if line.startswith('#'): | |
if not line.startswith('##'): | |
fixed, samples = parse_samples(line) | |
else: | |
if not format: | |
format = parse_format(line) | |
out.writerow(['SAMPLE'] + format + fixed) | |
variant = parse_variant(line) | |
for sample, call in zip(samples, parse_calls(line)): | |
while len(call) < len(format): | |
call.append('.') | |
out.writerow([sample] + call + variant) | |
It turned out not to work on the full length file. I think the problem had to do with line breaks; there were only line breaks between each variant in the vcf file, but the output contained lines that didn't start with reference IDs (i.e. chr1).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This was fantastically useful. Thanks!