Skip to content

Instantly share code, notes, and snippets.

@kylebgorman
Created March 19, 2020 21:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kylebgorman/1b2e5a2d2d5abe12e50c7b5772985460 to your computer and use it in GitHub Desktop.
Save kylebgorman/1b2e5a2d2d5abe12e50c7b5772985460 to your computer and use it in GitHub Desktop.
Converts from printable ASCII to a 27-character vocabulary
#!/usr/bin/perl
use strict;
use warnings;
use open ":encoding(ascii)";
binmode STDIN, ":encoding(ascii)";
binmode STDOUT, ":encoding(ascii)";
binmode STDERR, ":encoding(ascii)";
while (<>) {
$_ = uc; # Case-folds.
s/'//g; # Removes /'/ (right quotes and apostrophe).
s/[\d\p{PosixPunct}]/ /g; # Replaces digits and punctuation with space.
s/\s+/ /g; # Maps whitespace spans to a single / /s.
s/^\s+|\s+$//g; # Removes edge whitespace.
print "$_\n";
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment