Skip to content

Instantly share code, notes, and snippets.

@jamescasbon
Created October 19, 2011 08:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jamescasbon/1297761 to your computer and use it in GitHub Desktop.
Save jamescasbon/1297761 to your computer and use it in GitHub Desktop.
Melt VCF files into a reasonable tab delimited format, one genotype per line
#!/usr/bin/env python
""" Melt a VCF file into a tab delimited set of calls, one per line
VCF files have all the calls from different samples on one line. This
script reads vcf on stdin and writes all calls to stdout in tab delimited
format with one call in one sample per line. This makes it easy to find
a given sample's genotype with, say, grep.
"""
import sys
import csv
out = csv.writer(sys.stdout, delimiter='\t')
fixed = None
samples = None
format = None
def parse_samples(line):
toks = line[1:].split('\t')
fixed, samples = toks[:9], toks[9:]
return fixed, samples
def parse_format(line):
return line.split('\t')[8].split(':')
def parse_variant(line):
return line.split('\t')[:8]
def parse_calls(line):
for call in line.split('\t')[9:]:
yield call.split(':')
for line in sys.stdin:
line = line.rstrip()
if line.startswith('#'):
if not line.startswith('##'):
fixed, samples = parse_samples(line)
else:
if not format:
format = parse_format(line)
out.writerow(['SAMPLE'] + format + fixed)
variant = parse_variant(line)
for sample, call in zip(samples, parse_calls(line)):
while len(call) < len(format):
call.append('.')
out.writerow([sample] + call + variant)
@hbeale
Copy link

hbeale commented Sep 4, 2012

This was fantastically useful. Thanks!

@hbeale
Copy link

hbeale commented Sep 5, 2012

It turned out not to work on the full length file. I think the problem had to do with line breaks; there were only line breaks between each variant in the vcf file, but the output contained lines that didn't start with reference IDs (i.e. chr1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment