Skip to content

Instantly share code, notes, and snippets.

@andrewyatz
Created February 6, 2018 14:18
Show Gist options
  • Save andrewyatz/fe8f4478d9127696d90bc4804c292056 to your computer and use it in GitHub Desktop.
Save andrewyatz/fe8f4478d9127696d90bc4804c292056 to your computer and use it in GitHub Desktop.
Create a GC wig file from a FASTA file for a given window size. Algorithm is to chunk a sequence into non-overlapping windows of the specified size and calculating GC content. Output is expressed as a % with 2 decimal point precision
#!/usr/bin/env perl
use strict;
use warnings;
use Bio::SeqIO;
my ($window_size, $fasta, $out) = @ARGV;
die "No window size given" if ! $window_size;
die "No fasta input" if ! $fasta;
die "No output given" if ! $out;
my $seqio_object;
if ($fasta eq '-') {
$seqio_object = Bio::SeqIO->new(-fh => \*STDIN);
}
else {
$seqio_object = Bio::SeqIO->new(-file => $fasta);
}
open my $fh, '>', $out or die "Cannot open $out for writing to: $!";
while(my $seq_object = $seqio_object->next_seq()) {
my $length = $seq_object->length();
my $name = $seq_object->id();
my $windows = int($length / $window_size);
my $remaining = $length % $window_size;
$windows++ if $remaining != 0;
print "${name} ==> ${length} ==> ${windows} ==> ${remaining}\n";
my $start = 1;
my $end = $window_size;
while($start <= $length) {
$end = $length if $end > $length;
my $subseq = $seq_object->subseq($start, $end);
my $a = $subseq =~ tr/Aa//;
my $c = $subseq =~ tr/Cc//;
my $t = $subseq =~ tr/Tt//;
my $g = $subseq =~ tr/Gg//;
my $actg = length($subseq);
my $gc_content = ( ( $g + $c )/$actg )*100;
my $line = sprintf("%s\t%d\t%d\t%1.2f\n", $name, ($start-1), $end, $gc_content);
print $fh $line;
$start += $window_size;
$end += $window_size;
}
}
exit 0;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment