Skip to content

Instantly share code, notes, and snippets.

@eemp
Last active October 14, 2015 19:18
Show Gist options
  • Save eemp/1d4b3dfad0a5c55a4497 to your computer and use it in GitHub Desktop.
Save eemp/1d4b3dfad0a5c55a4497 to your computer and use it in GitHub Desktop.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
die "Usage: $0 FASTA_FILE" if !$ARGV[0];
# read the fasta file
my @lines;
open(FH, $ARGV[0]);
chomp(@lines = <FH>);
close(FH);
# output updates
my $curr_id;
my $curr_seq;
foreach my $l (@lines) {
# the regex below can be updated along with the logic
# the regex list of possible codes in a seq can be expanded based on https://en.wikipedia.org/wiki/FASTA_format
# or can match ids if it will always follow a particular format
if($l =~ m{^[ACGTU]+$}) {
$curr_seq .= $l;
}
else {
print "$curr_id\n$curr_seq\n" if ($curr_id && $curr_seq); # at this point we have an id and the entire seq to go with that id
$curr_id = $l;
$curr_seq = "";
}
}
print "$curr_id\n$curr_seq\n" if ($curr_id && $curr_seq); # last one
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment