Skip to content

Instantly share code, notes, and snippets.

@heikkil
Created April 21, 2014 07:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save heikkil/11134586 to your computer and use it in GitHub Desktop.
Save heikkil/11134586 to your computer and use it in GitHub Desktop.
grep for obo ontology files
#!/usr/bin/env perl
=head1 NAME
obogrep - grep obo entries with a string
=head1 SYNOPSIS
B<obogrep> [B<--version> | [B<-?|-h|--help>] | [B<-g|--debug>] |
[B<-v|--invert-match>] | [B<-c|--count>] |query obofile
=head1 DESCRIPTION
Grep-like command line program to explore ontology files in obo
format.
http://www.geneontology.org/GO.format.obo-1_4.shtml
The input is automatically detected. The default is a plain obo file.
If the filename ends in gz, the file is opened as gunzip. If there is
no filename, the input is assumed to come from STDIN, so that commands
can be piped together.
Matches are case insensitive.
Examples:
obogrep adherens go.obo.txt.gz | obogrep cellular_component| less
obogrep junction go.obo.txt.gz | obogrep lar_comp | obogrep -v musc
=head2 Options
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v
is specified by POSIX.)
-c, --count
Print out only the number of matching entries.
=head1 TODO
Does not take the obo file header into account.
Sometimes prints out spurious '[Term]' lines.
=head1 VERSION HISTORY
0.0, 15 Dec 2009, start of the project
0.1, 15 Dec 2009, basic functionality
0.2, 15 Dec 2009, simple rewrite
0.3, 20 Apr 2014, open compresses file automatically
0.4, 20 Apr 2014, input from STDIN
0.5, 20 Apr 2014, the count option: -c/--count
=head1 LICENSE
You may distribute this program under the same terms as perl itself.
=head1 AUTHOR
Heikki Lehvaslaiho, heikki lehvaslaiho a gmail com
=cut
use PerlIO::gzip;
use Getopt::Long;
use constant PROGRAMME_NAME => 'obogrep';
use constant VERSION => '0.5';
our $DEBUG = '';
our $INVERT = '';
our $COUNT = '';
GetOptions(
'V|version' => sub{ print PROGRAMME_NAME, ", version ", VERSION, "\n";
exit(1); },
'g|debug' => \$DEBUG,
'v|invert' => \$INVERT,
'c|count' => \$COUNT,
'h|help|?' => sub{ exec('perldoc',$0); exit(0) },
);
my $string = shift;
my $file = shift;
my $F;
if (not $file) {
$F = \*STDIN;
} elsif ($file =~ /gz$/) {
open $F, "<:gzip", $file or die "Can't open file $file: $!";
} else {
open $F, "<", $file or die "Can't open file $file: $!";
}
my $first = "\n";
local $/='[Term]';
my $c = 0;
while (<$F>) {
s/\[Term\]//;
if (/$string/i and not $INVERT) {
if ($COUNT) {
$c++;
} else {
print "[Term]$first$_";
}
} elsif (not /$string/i and $INVERT) {
if ($COUNT) {
$c++;
} else {
print "[Term]$first$_";
}
}
$first = '';
}
print "$c\n" if $COUNT;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment