Created
November 3, 2009 16:24
-
-
Save paulsmith/225179 to your computer and use it in GitHub Desktop.
A toy Markov chain generator, primed with magazine-length article text by The Nation's Chris Hayes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Rush Limbaugh all tore into him as nominee, could completely | |
alter the structure of her considerable enthusiasm into | |
persuading even the rare greater red horse that make solar cells, | |
a Ways and Means they are also, by and large ways, in gentler, | |
politer interactions with ministers and churches. That's really | |
how I first came into this thinking it was easy to surmise that | |
the mainstream of his shot-up airplane that paralyzed his right | |
arm paralyzed. Or the story of the world over, and scapegoating | |
the brown-skinned Other who is seeking an exemption in the | |
country--and board chair Hank Dittmar, an expert on ... | |
Rush Limbaugh offers Democrats an irresistible target as the | |
rhetoric used to be decided by them. "It is imperative that we | |
ought not to be converging on rhetorical rhythm. Clinton now not | |
only to have a right to feel like opinion shapers are rarely | |
delighted to find full-throated support for the individual | |
solider, but looking at the same political party and last year, | |
because the policies would change their votes because of Three | |
Big Facts that are common in a way that the Chicago Metropolitan | |
Sanitary District. (In commissioner James Kirie, frustrated by | |
the voters not been static. In the state level. | |
Rush Limbaugh and Sean Hannity--stand out because there's been a | |
remarkably consistent pattern. Hillary Clinton hemmed and hawed | |
when asked to renounce, only to be doing very little to do it, no | |
matter how often she frequented the halls of power, business | |
interests by making sure that is worthy of attention in the | |
world. Just before AM on a "soap box" and condemned the | |
sanctions, percent of voters were registered in Georgia weeping | |
for dead Jews in Brooklyn. The nation and "our community" were | |
suddenly indistinguishable, which is, at best, incomplete--and | |
maybe flat-out wrong. If you were hungry and get trained in ... | |
Rush Limbaugh a "fatass drug addict" made the case for international | |
leaders to use fireproof materials, brick or stone at the meeting, | |
told me that when you’re on the environment but they were being | |
taught. "Too often," the students to write their Congressmen' model | |
can do," he says. "But I feel like you're encountering him mid-thought | |
and need to end its cycle of abuse and the effects will be a snitch | |
who falsely accuses Veronica of drug paraphernalia. There's a broad | |
coalition of conservative power in that nexus," Looper says. "Politics | |
is about finding "man bites dog" stories; the bizarre, the | |
catastrophic, ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import re | |
import sys | |
import random | |
from collections import defaultdict | |
state = defaultdict(list) | |
start = None | |
def tokenize(s): | |
return re.split(r'\s+', s) | |
def filter_non_words(words): | |
return [w for w in words if re.search(r'[a-zA-Z]', w)] | |
def build(filelike=sys.stdin): | |
global start | |
words = filter_non_words(tokenize(filelike.read())) | |
for i in xrange(len(words)): | |
try: | |
prefix = tuple(words[i:i+2]) | |
suffix = words[i+2] | |
except IndexError: | |
break | |
if start is None: | |
start = prefix | |
state[prefix].append(suffix) | |
def generate(n=100, prefix=None): | |
prefix = prefix or start | |
print ' '.join(prefix), | |
for i in xrange(n): | |
word = random.choice(state[prefix]) | |
prefix = (prefix[1], word) | |
print word, | |
if __name__ == '__main__': | |
build() | |
generate() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import os.path | |
from StringIO import StringIO | |
import lxml.html | |
import markov | |
url = 'http://www.chrishayes.org/articles/' | |
article_text_file = '/tmp/hayes_articles.txt' | |
if not os.path.exists(article_text_file): | |
with open(article_text_file, 'w') as f: | |
# Fetch a list of URLs of the articles | |
articles_doc = lxml.html.parse(url).getroot() | |
articles_doc.make_links_absolute() | |
article_links = articles_doc.cssselect('.title a') | |
article_links = [l.attrib['href'] for l in article_links] | |
# Build up a list of article text | |
for url in article_links: | |
doc = lxml.html.parse(url).getroot() | |
text = doc.find_class('body')[0].text_content().encode('utf-8') | |
f.write(text + '\n') | |
# Generate a Markov chain based on the article text | |
markov.build(open(article_text_file)) | |
for _ in range(5): | |
markov.generate(n=100) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment