Skip to content

Instantly share code, notes, and snippets.

@andrewheiss
Created July 2, 2012 06:08
Show Gist options
  • Save andrewheiss/3031376 to your computer and use it in GitHub Desktop.
Save andrewheiss/3031376 to your computer and use it in GitHub Desktop.
Gutenberg Ipsum
#!/usr/bin/perl -w
# Modified from Dr. Drang's original script at http://www.leancrew.com/all-this/2011/02/dissociated-darwin/
use Games::Dissociate;
# Choose the corpus file
if ($#ARGV == -1) {
$corpus = "totc.txt";
} else {
$corpus = $ARGV[0];
}
# Slurp in the given corpus as a single string.
open(my $fh, "$ENV{HOME}/bin/gutenberg_ipsum/words/" . $corpus) or die "Can't open";
{local $/; $corpus = <$fh>;}
# Dissociate the corpus, using word pairs, and return 15-50 pairs.
$length = int(15 + rand(35));
$dis = dissociate($corpus, -2, $length);
# Remove quotes and other paired characters, since there might be some that are unmatched
# But this is an incredibly clunky fix. If I had more time/better Perl chops, I'd probably build some algorithm to find unmatched quotes or parentheses and insert them randomly in the text. But that's hard :)
$dis =~ s/[\"\[\]\_\(\)]//gm;
# Capitalize the first word and end it with a period.
$dis =~ s/^(.)/\u$1/;
$dis =~ s/[.);:?'", -]+$/./;
print $dis;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment