Skip to content

Instantly share code, notes, and snippets.

View standage's full-sized avatar

Daniel Standage standage

View GitHub Profile
@standage
standage / ncbi-fetch.pl
Last active December 16, 2015 00:09
Script for retrieving sequences from NCBI databases, based on examples written by Eric Sayers.
#!/usr/bin/env perl
# Copyright (c) 2013, Daniel S. Standage <daniel.standage@gmail.com>
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
@standage
standage / select-random.pl
Last active December 16, 2015 05:59
Select sequence(s) at random from a Fasta file.
#!/usr/bin/env perl
# Copyright (c) 2013, Daniel S. Standage <daniel.standage@gmail.com>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
@standage
standage / xgdbvm-add-tsa.sh
Last active December 16, 2015 07:09
Script for loading TSAs into xGDBvm
#!/usr/bin/env bash
# Copyright (c) 2013, Daniel S. Standage <daniel.standage@gmail.com>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
@standage
standage / ctgrz.pl
Created April 18, 2013 17:52
Given CSV output from ParsEval, categorize each comparison as a perfect match, a CDS match, an exon structure match, a UTR structure match, or a non-match.
#!/usr/bin/env perl
use strict;
while(my $line = <STDIN>)
{
chomp($line);
my @values = split(/,/, $line);
my $page = sprintf("%s/%d-%d.html", $values[0], $values[1], $values[2]);
my $cds_match_coef = $values[33];
@standage
standage / augustus-gff3-groom.sh
Created April 20, 2013 14:18
Several scripts/command for converting output of gene prediction programs to variants of GFF3 compatible with ParsEval.
#!/usr/bin/env bash
# Usage: bash augustus-gff3-groom.sh genes-before.gff3 > genes-after.gff3
sed $'s/\ttranscript\t/\tmRNA\t/' < $1 | grep -v $'\tintron\t'
@standage
standage / xtractore.pl
Last active December 16, 2015 11:19
Script for extracting sequences from a Fasta file given an annotation file in GFF3 format.
#!/usr/bin/env perl
# Copyright (c) 2010-2011, Daniel S. Standage <daniel.standage@gmail.com>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
@standage
standage / cegma-missing-kogs.sh
Last active December 16, 2015 11:29
After running CEGMA on your genome assembly, this script will identify the KOGs (if any) that are not mapped in your genome.
#!/usr/bin/env bash
# Copyright (c) 2013, Daniel S. Standage <daniel.standage@gmail.com>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
@standage
standage / pe-uniq.pl
Created April 27, 2013 18:47
A script to process ParsEval output and determine unmatched genes; i.e., reference genes for which there are no overlapping prediction genes, and vice versa; this script expects ParsEval output in text format. Usage: perl pe-uniq.pl < pe-out.txt > uniq-genes.txt
#!/usr/bin/env perl
# pe-uniq.pl: a script to process ParsEval output and determine unmatched genes;
# i.e., reference genes for which there are no overlapping prediction genes, and
# vice versa; this script expects ParsEval output in text format
#
# Usage: perl pe-uniq.pl < pe-out.txt > uniq-genes.txt
use strict;
my $locusmatch = quotemeta("|---- Locus:");
while(my $line = <STDIN>)
@standage
standage / simple-subseq.pl
Created April 30, 2013 17:50
Given a set of DNA sequences (in Fasta format) and a set of coordinates (in "seqid,start,end" format), extract the given subsequences.
#!/usr/bin/env perl
use strict;
use Bio::SeqIO;
my $usage = "perl $0 seqs.fasta < coords.csv > subseqs.fasta # coords.csv file is 3 comma-delimited values: seqid, start, and end";
my $seqfile = shift(@ARGV) or die("Usage: $usage");
# Load sequences into memory
my %seqs;
my $seqinput = Bio::SeqIO->new( "-file" => $seqfile, "-format" => "Fasta" );
@standage
standage / asmbleval.pl
Last active February 15, 2020 17:46
Calculate summary statistics for a genome/transcriptome assembly.
#!/usr/bin/env perl
use strict;
$/ = ">";
<STDIN>; # Discard "junk", if any, at beginning of the file.
my @sequencelengths;
my $gccontent = 0;
my $atcontent = 0;
my $combinedlength = 0;