Skip to content

Instantly share code, notes, and snippets.

@esommer
Last active December 21, 2015 22:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save esommer/6373580 to your computer and use it in GitHub Desktop.
Save esommer/6373580 to your computer and use it in GitHub Desktop.
Small, messy start to a Markov Chain problem that I tackled to play with and begin learning Python. Goal: Script reads text file and outputs sentences using vocabulary of that text and probabilistic determination of next words. Notes: (Ideas for where to improve / what I want to tackle next) - BUG: Sometimes it can't find a word in the dictionar…
import random
def text_to_paragraphs(file_name):
try:
file_data = open(file_name)
data = file_data.readlines()
data = [line.strip() for line in data]
data = filter(None, data)
return data
except IOError as err:
print('File error: ' + str(err))
def make_words_list(full_text):
words_list = []
for each_line in full_text:
words = each_line.split(' ')
words_list.extend(words)
return words_list
def gather_word_dict(words_list):
word_dict = {}
for i, each_word in enumerate(words_list):
if each_word in word_dict:
word_dict[each_word].append(words_list[i+1])
elif ((len(words_list) - 1) > i):
word_dict[each_word] = [words_list[i+1]]
else:
pass
return word_dict
def get_list_indexes(search_word, search_list):
indices = []
index_tally = 0
while search_word in search_list:
word_index = search_list.index(search_word)
index_tally = index_tally + word_index
if len(indices):
index_tally += 1
#question: do you need an else: pass here?
indices.append(index_tally)
search_list = search_list[word_index+1:]
return indices
def gen_new_text(num_sentences, words_list):
sentence_count = 0
text = ''
text_stack = []
start_word = ''
while (sentence_count < (num_sentences)):
if start_word == '':
sentence_stack = gen_new_sentence(words_list)
else:
sentence_stack = gen_new_sentence(words_list,start_word)
start_word = sentence_stack[-1]
sentence_count += 1
sentence = ' '.join(sentence_stack) + '. '
tall_sentence = sentence.capitalize()
text += tall_sentence
return text
def gen_new_sentence(words_list, first_word=''):
sentence = ''
sentence_stack = []
if first_word == '':
first_word = words_list[random.randint(0,(len(words_list)-1))]
second_word = gen_next_word(first_word, word_dict)
sentence_stack = [first_word,second_word]
else:
second_word = gen_next_word(first_word, word_dict)
if ((second_word == '//END//') or (len(word_dict[second_word]) == 1) or (len(word_dict.get(second_word + '.', [])) == 1)):
second_word = words_list[random.randint(0,(len(words_list)-1))]
sentence_stack = [second_word]
next_word = second_word
while ((next_word != '//END//') and (len(sentence_stack) < 20)):
next_word = gen_next_word(next_word, word_dict)
sentence_stack.append(next_word)
sentence_stack.pop()
return sentence_stack
def gen_next_word(this_word, word_dict):
next_word = ''
if this_word in word_dict:
if len(word_dict[this_word]) <= 1:
next_word_index = 0
else:
next_word_index = random.randint(0,(len(word_dict[this_word])-1))
next_word = word_dict[this_word][next_word_index]
elif this_word + '.' in word_dict:
if len(word_dict[this_word + '.']) <= 1:
next_word_index = 0
else:
next_word_index = random.randint(0,(len(word_dict[this_word + '.'])-1))
next_word = word_dict[this_word + '.'][next_word_index]
clean_word = next_word.strip('.')
next_word = clean_word
return next_word
file_name = 'testing.txt'
paragraphs = text_to_paragraphs(file_name)
words_list = make_words_list(paragraphs)
word_dict = gather_word_dict(words_list)
print(gen_new_text(3,words_list))
#print(word_dict)
Here is a bunch of text that I'm just using so that I can save a file and open it with Python. So there. Is this enough? Sure. For now, this is enough.
Here is a new paragraph. Let's see what this does! I'm adding another sentence or two. What was that? What happens if I add two questions in a row? How about three? Ah, I see. Well, alright. That's all I have to say. It's hard to concentrate here.
Elephants are large mammals of the family Elephantidae and the order Proboscidea. Traditionally, two species are recognised, the African elephant (Loxodonta africana) and the Asian elephant (Elephas maximus), although some evidence suggests that African bush elephants and African forest elephants are separate species (L. africana and L. cyclotis respectively). Elephants are scattered throughout sub-Saharan Africa, and South and Southeast Asia. They are the only surviving proboscideans; extinct species include mammoths and mastodons. The largest living terrestrial animals, male African elephants can reach a height of 4 m (13 ft) and weigh 7,000 kg (15,000 lb). These animals have several distinctive features, including a long proboscis or trunk used for many purposes, particularly for grasping objects. Their incisors grow into tusks, which serve as tools for moving objects and digging and as weapons for fighting. The elephant's large ear flaps help to control the temperature of its body. African elephants have larger ears and concave backs while Asian elephants have smaller ears and convex or level backs.
Elephants are herbivorous and can be found in different habitats including savannahs, forests, deserts and marshes. They prefer to stay near water. They are considered to be keystone species due to their impact on their environments. Other animals tend to keep their distance, and predators such as lions, tigers, hyenas and wild dogs usually target only the young elephants (or "calves"). Females ("cows") tend to live in family groups, which can consist of one female with her calves or several related females with offspring. The groups are led by an individual known as the matriarch, often the oldest cow. Elephants have a fission-fusion society in which multiple family groups come together to socialise. Males ("bulls") leave their family groups when they reach puberty, and may live alone or with other males. Adult bulls mostly interact with family groups when looking for a mate and enter a state of increased testosterone and aggression known as musth, which helps them gain dominance and reproductive success. Calves are the centre of attention in their family groups and rely on their mothers for as long as three years. Elephants can live up to 70 years in the wild. They communicate by touch, sight, smell and sound; elephants use infrasound, and seismic communication over long distances. Elephant intelligence has been compared with that of primates and cetaceans. They appear to have self-awareness and show empathy for dying or dead individuals of their kind.
African elephants are listed as vulnerable by the International Union for Conservation of Nature (IUCN), while the Asian elephant is classed as endangered. One of the biggest threats to elephant populations is the ivory trade, as the animals are poached for their ivory tusks. Other threats to wild elephants include habitat destruction and conflicts with local people. Elephants are used as working animals in Asia. In the past they were used in war; today, they are often put on display in zoos and circuses. Elephants are highly recognisable and have been featured in art, folklore, religion, literature and popular culture.
Thanks, Wikipedia, for your article on Elephants. Elephants are my favorite animals.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment