Skip to content

Instantly share code, notes, and snippets.

@paulsmith
Created November 3, 2009 16:24
Show Gist options
  • Save paulsmith/225179 to your computer and use it in GitHub Desktop.
Save paulsmith/225179 to your computer and use it in GitHub Desktop.
A toy Markov chain generator, primed with magazine-length article text by The Nation's Chris Hayes
Rush Limbaugh all tore into him as nominee, could completely
alter the structure of her considerable enthusiasm into
persuading even the rare greater red horse that make solar cells,
a Ways and Means they are also, by and large ways, in gentler,
politer interactions with ministers and churches. That's really
how I first came into this thinking it was easy to surmise that
the mainstream of his shot-up airplane that paralyzed his right
arm paralyzed. Or the story of the world over, and scapegoating
the brown-skinned Other who is seeking an exemption in the
country--and board chair Hank Dittmar, an expert on ...
Rush Limbaugh offers Democrats an irresistible target as the
rhetoric used to be decided by them. "It is imperative that we
ought not to be converging on rhetorical rhythm. Clinton now not
only to have a right to feel like opinion shapers are rarely
delighted to find full-throated support for the individual
solider, but looking at the same political party and last year,
because the policies would change their votes because of Three
Big Facts that are common in a way that the Chicago Metropolitan
Sanitary District. (In commissioner James Kirie, frustrated by
the voters not been static. In the state level.
Rush Limbaugh and Sean Hannity--stand out because there's been a
remarkably consistent pattern. Hillary Clinton hemmed and hawed
when asked to renounce, only to be doing very little to do it, no
matter how often she frequented the halls of power, business
interests by making sure that is worthy of attention in the
world. Just before AM on a "soap box" and condemned the
sanctions, percent of voters were registered in Georgia weeping
for dead Jews in Brooklyn. The nation and "our community" were
suddenly indistinguishable, which is, at best, incomplete--and
maybe flat-out wrong. If you were hungry and get trained in ...
Rush Limbaugh a "fatass drug addict" made the case for international
leaders to use fireproof materials, brick or stone at the meeting,
told me that when you’re on the environment but they were being
taught. "Too often," the students to write their Congressmen' model
can do," he says. "But I feel like you're encountering him mid-thought
and need to end its cycle of abuse and the effects will be a snitch
who falsely accuses Veronica of drug paraphernalia. There's a broad
coalition of conservative power in that nexus," Looper says. "Politics
is about finding "man bites dog" stories; the bizarre, the
catastrophic, ...
#!/usr/bin/env python
import re
import sys
import random
from collections import defaultdict
state = defaultdict(list)
start = None
def tokenize(s):
return re.split(r'\s+', s)
def filter_non_words(words):
return [w for w in words if re.search(r'[a-zA-Z]', w)]
def build(filelike=sys.stdin):
global start
words = filter_non_words(tokenize(filelike.read()))
for i in xrange(len(words)):
try:
prefix = tuple(words[i:i+2])
suffix = words[i+2]
except IndexError:
break
if start is None:
start = prefix
state[prefix].append(suffix)
def generate(n=100, prefix=None):
prefix = prefix or start
print ' '.join(prefix),
for i in xrange(n):
word = random.choice(state[prefix])
prefix = (prefix[1], word)
print word,
if __name__ == '__main__':
build()
generate()
#!/usr/bin/env python
import os.path
from StringIO import StringIO
import lxml.html
import markov
url = 'http://www.chrishayes.org/articles/'
article_text_file = '/tmp/hayes_articles.txt'
if not os.path.exists(article_text_file):
with open(article_text_file, 'w') as f:
# Fetch a list of URLs of the articles
articles_doc = lxml.html.parse(url).getroot()
articles_doc.make_links_absolute()
article_links = articles_doc.cssselect('.title a')
article_links = [l.attrib['href'] for l in article_links]
# Build up a list of article text
for url in article_links:
doc = lxml.html.parse(url).getroot()
text = doc.find_class('body')[0].text_content().encode('utf-8')
f.write(text + '\n')
# Generate a Markov chain based on the article text
markov.build(open(article_text_file))
for _ in range(5):
markov.generate(n=100)
print
print
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment