Skip to content

Instantly share code, notes, and snippets.

@diego898
Last active June 1, 2017 22:32
Show Gist options
  • Save diego898/d41be46e4f2c6bd7f192d20bfe2c11e0 to your computer and use it in GitHub Desktop.
Save diego898/d41be46e4f2c6bd7f192d20bfe2c11e0 to your computer and use it in GitHub Desktop.
#!/usr/bin/env perl
# Finds duplicate adjacent words.
# run in a directory: perl dupe_words *.tex
# Note: it does not recursively check subdirectories
use strict ;
my $DupCount = 0 ;
if (!@ARGV) {
print "usage: dups <file> ...\n" ;
exit ;
}
while (1) {
my $FileName = shift @ARGV ;
# Exit code = number of duplicates found.
exit $DupCount if (!$FileName) ;
open FILE, $FileName or die $!;
my $LastWord = "" ;
my $LineNum = 0 ;
while (<FILE>) {
chomp ;
$LineNum ++ ;
my @words = split (/(\W+)/) ;
foreach my $word (@words) {
# Skip spaces:
next if $word =~ /^\s*$/ ;
# Skip punctuation:
if ($word =~ /^\W+$/) {
$LastWord = "" ;
next ;
}
# Found a dup?
if (lc($word) eq lc($LastWord)) {
print "$FileName:$LineNum $word\n" ;
$DupCount ++ ;
} # Thanks to Sean Cronin for tip on case.
# Mark this as the last word:
$LastWord = $word ;
}
}
close FILE ;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment