Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@jan-g
Last active February 5, 2020 14:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jan-g/451122cae6e13427c0c1ea93ff1e5df0 to your computer and use it in GitHub Desktop.
Save jan-g/451122cae6e13427c0c1ea93ff1e5df0 to your computer and use it in GitHub Desktop.
travesty, naive version and then an approach undreamt of back in 1984
#!/usr/bin/env python3
# Invoke with
# ./v1.py {source-file} {n} {m}
# where {n} is the size of the n-gram used (3 for trigrams)
# {m} is the amount of output text required.
from collections import Counter
import os
from random import choice
import sys
# Very simply, scan the text for n-grams each time
with open(sys.argv[1]) as f:
book = " ".join(l.strip() for l in f)
n = int(sys.argv[2])
book = book.lower()
prefix = book[:n]
book += prefix
print(prefix, end="")
for _ in range(int(sys.argv[3])):
following = Counter()
# This approach was beyond the reach of systems back in 1984
for i in range(len(book) - n):
if book[i:i+n] == prefix:
following[book[i+n]] += 1
# Weighted selection of following character
items = "".join(k * following[k] for k in following)
next = choice(items)
prefix = prefix[1:] + next
print(next, flush=True, end="")
print()
"""
Sample run:
% time ./v1.py ulysses-75.txt 3 1000
stainstately: --coment. a preachering morning sleepy, pleaned at the bearly bearful and loose from the bowl smile the paused in a loose whis arts little about and at ble off the lips.stains. a long in a yellow white current. a mild made rounted, hered gathe coarsely, pale towards head, he bearing gunrest. he teeth glistle off the coarsely: --come folds of sternly belover him, equietly and the bowl off the shadown, he he air, gravely: --forward awhite call, the looked the chrill do nice towed at the pleasant untoned at and made rountonsurrough the bristening shut you? he bearly attent. he tonsured at the currountair, thriskly. buck to barracks! head, will do nice the peere dark witch a rapid chrysostomos. a prelathe stairhead. a rapt and a preacher the skipped dower, gurgling stainstaircase about cover then came from the briskly. shadown, unted long instair, gavely. ther's top of stephen, unding her's the paused him, equined gunressed and bles arts instate, alted, behind broke quietly
./v1.py ulysses-75.txt 3 1000 0.71s user 0.04s system 95% cpu 0.778 total
Sample run over the whole of Ulysses:
% time ./v1.py ulysses.txt 4 1000
stamps or a miration a meal. crate can suddened ther abbed time _sonny sunshapel on no apex of his alway any of card pries quizzed, tonight to knows as leopoldiers to him. for teache, stering what he? mrs many branted with distice. we’ve and her valism with me always in shot. is the was ends street, seconnelly! who for so to mervyn talmud his purchyard backward as trick of pose her phans cert to burning her her carried by a little put hee. the out the toward, you know to the cows, bitted tir. jamjam lyons or of frated to till your not. street the greator cowley for luck. two could gradual and our was full being beheld after of the flowed to the liberal inity (a new backay throwing such of two picker’s chequentleman elm in shiverbeer’s goim not sun has record on walking my watering she killa. sland piper show he curlie, the greet. harry and the ossibly dislike the leath of old for a nannyman in his had batement keving out read ocular dedalus homas, cruelocitizen buried with and lease.
./v1.py ulysses.txt 4 1000 461.61s user 2.26s system 98% cpu 7:52.75 total
(This is plausible, albeit laborious: the naive approach produces a little over a character per second.)
"""
#!/usr/bin/env python3
# Invoke with
# ./v2.py {source-file} {n} {m}
# where {n} is the size of the n-gram used (3 for trigrams)
# {m} is the amount of output text required.
from collections import Counter, defaultdict
import os
from random import choice
import sys
# Memory is cheap and abundent. Scan the whole text for n-grams and build the follower information once
with open(sys.argv[1]) as f:
book = " ".join(l.strip() for l in f)
n = int(sys.argv[2])
book = book.lower()
# print(book, n)
prefix = book[:n]
book += prefix
# Scan the whole thing once
following = defaultdict(Counter)
print(prefix, flush=True, end="")
# There'll be a brief pause whilst this bit runs...
for i in range(len(book) - n):
following[book[i:i+n]][book[i+n]] += 1
# ... then this will go like the clappers
for _ in range(int(sys.argv[3])):
items = "".join(k * following[prefix][k] for k in following[prefix])
next = choice(items)
prefix = prefix[1:] + next
print(next, flush=True, end="")
print()
"""
Sample run:
% time ./v2.py ulysses-75.txt 3 1000
state, and sleeped a yellow whistine current, watching airs and the middled then, he and and gent. he peepy, len white towards his and him, equine: --coment, was sured dowed and at will you? he sternly. --come up and loose faced grainstately: --that his an corpuscles. a prelate, aloft and solemnly he shaking a looked blooked about answer, genuined: --_intoned hued his thistlessed and briskly. the teeth gown then off the mirror an coved, he bent, was sustaircase. sleeped histlessing gunrest at the sustained a rapid chrysostomos. switching moment smile of staine in its legs tonsured and ounted, he tone: --forwards of lathe patroublessed gurgling a pleaned gravely. he body at and bristlessed dowed off the skipped gunrest. held thrountairhead. stron whis then, he sternly the said sustair, those a yellow dressed. slow music, patroibo ad about and made rouns. shadown the mirrountonsured him, equine mild pointoned out and his, displeaned gravely overe and shriskly. two stair, thrice and b
./v2.py ulysses-75.txt 3 1000 0.07s user 0.02s system 92% cpu 0.101 total
Sample run over the whole of Ulysses:
% time ./v2.py ulysses.txt 4 1000
stages of anxious usuallyant some. of frequence. remine night, he irish for the favourite of the jew, they hair churn, queen. and a corporation. —as ring gripping fellows your buggest of tea anythin they said. —the prudes and yet the called id looked yellow! sunshave behalf bass aim growly find shill. shark quietly and the foundrum, benjaming then of a strong over doctory of joined thing! clamous called to irely not? _corporary. other face. what of it. put on the next. —is it. not an over foldeth under seems and watch our blood for my choir simple them, tonehead.)_ altogether, stephen said. i’m notebook, guardiner? here and and the fixed, and them me.) let mr bloom thand dublin vario, cover dunsing to it away. my sking, he had for fineluctantlemanly stephen’s in cunning down before hold? mr black with of could doubtablist of dublin a commodious elleopold suit was perhaps to ther. _(he murmuring. he door crows.)_ mr bloom! you know him of ering, says gimless and quare of his m
./v2.py ulysses.txt 4 1000 1.63s user 0.06s system 98% cpu 1.713 total
(There's about a second-and-a-half pause before that runs.)
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment