Skip to content

Instantly share code, notes, and snippets.

@jimregan
Created February 27, 2017 23:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jimregan/664c8e330e967e92346393054c8a5dce to your computer and use it in GitHub Desktop.
Save jimregan/664c8e330e967e92346393054c8a5dce to your computer and use it in GitHub Desktop.
Search for homophones
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
open(IN, "<", "$ARGV[0]");
binmode(IN, ":encoding(latin-1)");
#open(OUT, ">", "collisions.txt");
#binmode(OUT, ":utf8");
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
my %p2g = ();
while(<IN>) {
chomp;
s/^\(//;
my ($word, $pos, $pronraw) = split/\t/;
next if ($pos eq 'pos');
my $pron = $pronraw;
$pron =~ s/[0-9]\)//g;
$pron =~ s/[()]//g;
$pron =~ s/^ //;
$pron =~ s/\r//;
$pron =~ s/ $//;
my $value = $word . '__' . $pos;
if(exists $p2g{$pron}) {
if ($p2g{$pron} =~ /$value/) {
#print STDERR "DUP: $p2g{$pron} $value\n";
} else {
$p2g{$pron} .= ", $value";
}
} else {
$p2g{$pron} = $value;
}
}
for my $pron (keys %p2g) {
print $pron . "\t" . $p2g{$pron} . "\n";
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment