Skip to content

Instantly share code, notes, and snippets.

@Konfekt
Created November 15, 2016 14:43
Show Gist options
  • Save Konfekt/2328e1ae5f8305199d46fbcbe9dec01b to your computer and use it in GitHub Desktop.
Save Konfekt/2328e1ae5f8305199d46fbcbe9dec01b to your computer and use it in GitHub Desktop.
script that takes a word list and groups into words that differ by a single letter
#!/usr/bin/perl
use utf8;
my (@list, @paronyms);
my $file = shift;
die "Usage: $0 LIST" unless $file;
open(LIST, "+<:encoding(UTF-8)", $file) or die "$0: Can't open $file $!";
push(@list, $_) while <LIST>; # Read whole list
# Process whole list
WORD:
foreach my $word (@list) {
if( $word ~~ @paronyms ){
next WORD;
}
foreach my $line (@list) {
if (length($word) == length($line) && hd($word, $line) <= 1) {
push(@paronyms, $line);
}
}
push(@paronyms, "\n");
}
# Overwrite original list with purged list
truncate(LIST, 0);
seek(LIST, 0, 0);
print LIST @paronyms;
close(LIST);
# See http://stackoverflow.com/questions/8459585/how-do-i-make-inexact-string-comparisons-with-perl
sub hd{ length( $_[ 0 ] ) - ( ( $_[ 0 ] ^ $_[ 1 ] ) =~ tr[\0][\0] ) }  
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment