Skip to content

Instantly share code, notes, and snippets.

@morungos
Last active December 27, 2015 02:39
Show Gist options
  • Save morungos/7253919 to your computer and use it in GitHub Desktop.
Save morungos/7253919 to your computer and use it in GitHub Desktop.
Quick script to convert DCC output data to VCF. It's a bit grubby to say the least (only uses core dependencies) and only handles SNPs.
#!/usr/bin/env perl -w
use strict;
use warnings;
# Basic script to convert DCC output for SNPs to a VCF file capable of being
# used for later analysis. More complex variants won't yet work, due to the need
# for reference genome information which we don't get from the DCC output yet.
#
# Written to use minimal Perl dependencies.
# Stuart Watt; 31st October 2013. Thanks to the ghosties for helping.
# Ontario Institute for Cancer Research
use Carp;
use Getopt::Long;
use Pod::Usage;
my $opt_help;
GetOptions("help" => \$opt_help) || pod2usage(2);
pod2usage(1) if ($opt_help);
my $in_file = shift @ARGV;
my $out_file = shift @ARGV;
my $in_fh;
my $is_stdin = 0;
if (defined($in_file)) {
open $in_fh, "<", $in_file or die $!;
} else {
$in_fh = *STDIN;
$is_stdin++;
}
my $out_fh;
my $is_stdout = 0;
if (defined($out_file)) {
open $out_fh, ">", $out_file or die $!;
} else {
$out_fh = *STDOUT;
$is_stdout++;
}
my $header = <$in_fh>;
chomp($header);
my @header = split(/\t/, $header);
if ($header[1] ne 'Genomic DNA Change') {
carp("Expected to find header 'Genomic DNA Change': this could be a problem");
}
print $out_fh <<"__HEADER__";
##fileformat=VCFv4.0
##fileDate=20090805
#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO
__HEADER__
while (<$in_fh>){
chomp;
my @data = split(/\t/);
my $change = $data[1];
if ($change =~ m{^chr(\w{1,2}):g\.(\d+)([ACGT])>([ACGT])$}) {
my $chromosome = $1;
my $position = $2;
my $reference = $3;
my $variant = $4;
print $out_fh "$chromosome\t$position\t.\t$reference\t$variant\t.\tPASS\t.\n";
} else {
carp("Invalid change: $change");
}
# Process
}
close($in_fh) unless ($is_stdin);
close($out_fh) unless ($is_stdout);
__END__
=head1 NAME
dcc_to_vcf.pl - Convert DCC output to VCF files
=head1 SYNOPSIS
dcc_to_vcf.pl [options] [input] [output]
Options:
=over 4
=item -help
prints a brief help message
=back
Input and output are both optional, defaulting to
standard input and standard output respectively.
=head1 OPTIONS
=over 8
=item B<-help>
Print a brief help message and exits.
=back
=head1 DESCRIPTION
This program will read the given DCC export files and convert them to
VCF format.
Currently only SNP variants are converted properly.
=cut
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment