Skip to content

Instantly share code, notes, and snippets.

View hyphaltip's full-sized avatar

Jason Stajich hyphaltip

View GitHub Profile
@hyphaltip
hyphaltip / barcode.pl
Created December 5, 2012 23:27 — forked from ChemicalJames/barcode.pl
Project Scripts
#!/bin/perl -w
#sort sequences into two files according to 5' barcode
use strict;
use warnings;
use Bio::SeqIO;
my $file = 'trimmed_seq.fa';
my $in = Bio::SeqIO->new(-format => 'Fasta',
@hyphaltip
hyphaltip / Week2 #5-7
Created October 26, 2012 04:09 — forked from mhan008/Week2 #5-7
Week2 #5-7
#!/usr/bin/perl
use strict;
use warnings;
my @seqnames = ("AAC35278", "AnCSMA", "AfCHSF", "AAF19257", "P30573-1");
my @seqs = ("LLIAITYYNEDKVLTARTLHGVMQNPAWQKIVVCLVFDGIDPVLATIGV-VMKKDVDGKE","AMCLVTCYSEGEEGIRTTLDSIALTPN-SHKSIVVICDGIIKVLRMMRD-TGSKRHNMAK", "ALCLVTCYSEGEEGIRTTLDSIAMTPN$
for ( my $i = 0; $i <= 4 ; $i++) {
print "Sequence name is $seqnames[$i]\n";
my @residues = split('-',$seqs[$i]);
Hi all,
Cluless newbie here (first time touching Perl 48 hours ago...), for which apologies.
I'm trying to take a genbank file (.gb), and create a FASTA file with a specific identifier line for each sequence. Specifically, I want the "host" tag as the identifier. With the help of the Bioperl beginner readme and the HOWTO's (which are great!) I've worked out how to loop through my sequences and get the 'host' tag for each one. For some reason, I get two identifier lines for each sequence. I guess the problem is in the 'for' loop--it's running the stuff below it twice, once with the actual 'host' tag data and once with...nothing? Not sure.
I think I can work out how to use s/ and a regex just to delete the second identifier line, but that feels like I'm avoiding the problem instead of fixing it. Any help appreciated!
Many thanks,
haywardjeremya@gmail.com