Skip to content

Instantly share code, notes, and snippets.

@llamasoft
Created December 6, 2016 18:14
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save llamasoft/0c762dfba1615768718397414b63dde6 to your computer and use it in GitHub Desktop.
Save llamasoft/0c762dfba1615768718397414b63dde6 to your computer and use it in GitHub Desktop.
Princeton WordNet Parser Short Example
#!/usr/bin/perl
while (my $line = <>) {
# If no 8-digit byte offset is present, skip this line
if ( $line !~ /^[0-9]{8}\s/ ) { next; }
chomp($line);
my @tokens = split(/ /, $line);
shift(@tokens); # Byte offset
shift(@tokens); # File number
shift(@tokens); # Part of speech
my $word_count = hex(shift(@tokens));
foreach ( 1 .. $word_count ) {
my $word = shift(@tokens);
$word =~ tr/_/ /;
$word =~ s/\(.*\)//;
print $word, "\n";
shift(@tokens); # Lexical ID
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment