Skip to content

Instantly share code, notes, and snippets.

View hyphaltip's full-sized avatar

Jason Stajich hyphaltip

View GitHub Profile
@hyphaltip
hyphaltip / README
Created October 18, 2011 05:24
BLAST+ output compare long to compact output to try and guess groupings?
the .blast is tblastn from BLAST+ run with -max_intron_length 300 and the text or -outfmt 6 output is shown.
the wublast output is from tblastn run with -links and hspsepsmax - you can see the two group as an HSP group (hits 3 and 4 out of the set).
Hi all,
Cluless newbie here (first time touching Perl 48 hours ago...), for which apologies.
I'm trying to take a genbank file (.gb), and create a FASTA file with a specific identifier line for each sequence. Specifically, I want the "host" tag as the identifier. With the help of the Bioperl beginner readme and the HOWTO's (which are great!) I've worked out how to loop through my sequences and get the 'host' tag for each one. For some reason, I get two identifier lines for each sequence. I guess the problem is in the 'for' loop--it's running the stuff below it twice, once with the actual 'host' tag data and once with...nothing? Not sure.
I think I can work out how to use s/ and a regex just to delete the second identifier line, but that feels like I'm avoiding the problem instead of fixing it. Any help appreciated!
Many thanks,
haywardjeremya@gmail.com
@hyphaltip
hyphaltip / sequence_ORF_finding.pl
Created October 8, 2012 00:57
Perl code for the sequence as part of problem 1, weeks 1-2
#!/usr/bin/perl
use warnings;
use strict;
my $seq ="AGACAAGTCGGACGTTTCATCTGAGGGTTCTTCTGCCTCCGCACTTGGTGCACATCAGACAAGGCAATCA
TGGGGGACGCTCAGATGGCAGAGTTTGGAGCAGCAGCTTCTTACCTGCGAAAGTCAGATCGAGAGCGTCT
GGAAGCACAAACCCGTCCCTTTGATATGAAAAAGGAGTGTTTTGTGCCTGATCCAGATGAAGAGTATGTA
AAAGCTTCAATCGTCAGTCGTGAAGGTGACAAAGTCACTGTACAGACTGAGAAAAGAAAGACTGTAACTG
TAAAGGAAGCTGACATTCACCCCCAGAACCCTCCAAAGTTTGATAAAATTGAAGACATGGCAATGTTCAC
CTTCCTTCATGAGCCAGCCGTGCTGTTCAACCTCAAAGAGCGCTATGCAGCATGGATGATCTATACCTAC
TCAGGACTGTTTTGTGTCACTGTCAACCCCTACAAGTGGCTGCCGGTGTACAATCAGGAGGTGGTTGTAG
@hyphaltip
hyphaltip / mRNASeq.pl
Created October 22, 2012 04:14
James Wong - hw1-2
#!/usr/bin/perl
use warnings;
use strict;
my $seq ="AGACAAGTCGGACGTTTCATCTGAGGGTTCTTCTGCCTCCGCACTTGGTGCACATCAGACAAGGCAATCA
TGGGGGACGCTCAGATGGCAGAGTTTGGAGCAGCAGCTTCTTACCTGCGAAAGTCAGATCGAGAGCGTCT
GGAAGCACAAACCCGTCCCTTTGATATGAAAAAGGAGTGTTTTGTGCCTGATCCAGATGAAGAGTATGTA
AAAGCTTCAATCGTCAGTCGTGAAGGTGACAAAGTCACTGTACAGACTGAGAAAAGAAAGACTGTAACTG
TAAAGGAAGCTGACATTCACCCCCAGAACCCTCCAAAGTTTGATAAAATTGAAGACATGGCAATGTTCAC
CTTCCTTCATGAGCCAGCCGTGCTGTTCAACCTCAAAGAGCGCTATGCAGCATGGATGATCTATACCTAC
@hyphaltip
hyphaltip / problem1_1.pl
Created October 26, 2012 03:50
JollyWeek1
#!/usr/bin/perl -w
use strict;
use warnings;
use Bio::SeqIO;
use Bio::Seq;
use Bio::AlignIO;
my $sequence;
my $seq_obj;
@hyphaltip
hyphaltip / Week2 #5-7
Created October 26, 2012 04:09 — forked from mhan008/Week2 #5-7
Week2 #5-7
#!/usr/bin/perl
use strict;
use warnings;
my @seqnames = ("AAC35278", "AnCSMA", "AfCHSF", "AAF19257", "P30573-1");
my @seqs = ("LLIAITYYNEDKVLTARTLHGVMQNPAWQKIVVCLVFDGIDPVLATIGV-VMKKDVDGKE","AMCLVTCYSEGEEGIRTTLDSIALTPN-SHKSIVVICDGIIKVLRMMRD-TGSKRHNMAK", "ALCLVTCYSEGEEGIRTTLDSIAMTPN$
for ( my $i = 0; $i <= 4 ; $i++) {
print "Sequence name is $seqnames[$i]\n";
my @residues = split('-',$seqs[$i]);
@hyphaltip
hyphaltip / barcode.pl
Created December 5, 2012 23:27 — forked from ChemicalJames/barcode.pl
Project Scripts
#!/bin/perl -w
#sort sequences into two files according to 5' barcode
use strict;
use warnings;
use Bio::SeqIO;
my $file = 'trimmed_seq.fa';
my $in = Bio::SeqIO->new(-format => 'Fasta',
@hyphaltip
hyphaltip / genome_size_gene_stat.pdf
Last active November 18, 2015 18:24
genome size, coding genes in fungi
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
#module load emboss (need emboss tools - install on mac with homebrew or other methods)
#download
curl -C - -O http://microsporidiadb.org/common/downloads/Current_Release/NparisiiERTm1/fasta/data/MicrosporidiaDB-26_NparisiiERTm1_AnnotatedCDSs.fasta
curl -C - -O http://microsporidiadb.org/common/downloads/Current_Release/NematocidaSp1ERTm2/fasta/data/MicrosporidiaDB-26_NematocidaSp1ERTm2_AnnotatedCDSs.fasta
curl -C - -O http://microsporidiadb.org/common/downloads/Current_Release/NparisiiERTm3/fasta/data/MicrosporidiaDB-26_NparisiiERTm3_AnnotatedCDSs.fasta
geecee MicrosporidiaDB-26_NematocidaSp1ERTm2_AnnotatedCDSs.fasta Nsp1ERT2.geecee
geecee MicrosporidiaDB-26_NparisiiERTm1_AnnotatedCDSs.fasta NparERT1.geecee
geecee MicrosporidiaDB-26_NparisiiERTm3_AnnotatedCDSs.fasta NparERT3.geecee
@hyphaltip
hyphaltip / gist:5843924
Created June 23, 2013 05:39
gh61-family alignment
CLUSTAL FORMAT for T-COFFEE Version_8.97_101117 [http://www.tcoffee.org] [MODE: ], CPU=0.01 sec, SCORE=88, Nseq=4, Len=381
NCU05969 MPSFTSKSLLAVLAGAASVAAHGHVSNIVINGEYYRGFDS-SLNYMANPP
NCU07898 MKTF-----ATLLASIGLVAAHGFVDNATIGGQFYQPYQ---DPYMGSPP
NCU07760 MARM---SILTALAGASLVAAHGHVSKVIVNGVEYQNYDPTSFPYNSNPP
TRIREDRAFT_73643 MIQKLSNLLVTALAVATGVVGHGHINDIVINGVWYQAYDPTTFPYESNPP
* : ** *..**.:.. :.* *: :: * ..**
NCU05969 AVVGWKANNQDNGFVGPDAFSSPDIICHKDATNAKGHAVVKAGDKISIQW
NCU07898 DRISRKIP--GNGPV--EDVTSLAIQCNADSAPAKLHASAAAGSTVTLRW