Skip to content

Instantly share code, notes, and snippets.

@slavailn
Created April 26, 2016 06:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slavailn/4d1ee0a889b0efe7322c113b13b4e691 to your computer and use it in GitHub Desktop.
Save slavailn/4d1ee0a889b0efe7322c113b13b4e691 to your computer and use it in GitHub Desktop.
#! /usr/bin/perl
use strict; use warnings;
# This script will find identical fasta sequences with different
# identifiers and output them into a new file in the form of
# (id1|id2). All of the other fasta entries will output as is
# Example:
# >id1
# ATTCGGTCC
# >id2
# AAAGGGTTTCCC
# >id3
# ATTCGGTCC
# Will be output as:
# # >id1|id3
# ATTCGGTCC
# >id2
# AAAGGGTTTCCC
my $fasta = shift or die "Please provide valid fasta file!\n";
my %fasta_hash = ();
open( my $fasta_fh, "<", $fasta ) or die "Cannot open file: $!\n";
while( <$fasta_fh> )
{
chomp;
if ( $_ =~ m/^>(.+)/ )
{
$fasta_hash{$1} = ();
}
else
{
$fasta_hash{$1} = $_;
}
}
my %reverse = ();
while ( my ($key, $value) = each %fasta_hash )
{
push( @{$reverse{$value}}, $key );
}
for ( keys %reverse )
{
print ">", join( '|', @{ $reverse{$_} } ) , "\n";
print "$_\n";
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment