Created
November 6, 2014 04:10
-
-
Save mgill25/699fbd2c5f0ef59d0be5 to your computer and use it in GitHub Desktop.
Playing around with Markov chains
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Markov chain implementation in Ruby | |
# We take some text as input, apply the Markov chain algorithm to it, | |
# and produce some other text as output. Lets see how it is done! | |
# Markov chain means running an FSM with the states having unequal probability | |
# of transition. | |
# References: | |
# A Mathematical theory of Communication, by C.E. Shannon | |
# Accuracy can be adjusted by increasing the order (n-gram length) | |
# IRC Bot setting: Frequency (in percentage) | |
# So how would Markov chaining work for text? | |
# We analyse the input stream to get a better measure of the frequency | |
# and then use that frequency to generate a new stream via an FSM. | |
# The FMS will switch states on probabilities based on those frequencies. | |
# In order to be more accurate, we can consider more than one token from | |
# stream collectively (2 or 3 gram) and take that into account. | |
def get_random_float (min, max) | |
rand * (max - min) + min | |
end | |
def get_weighted_item items | |
# key:val => item:probability | |
n = get_random_float(0, 1) | |
current_item = nil | |
items.each_with_index do |(item, probability), index| | |
current_item = item | |
if n < probability | |
break | |
end | |
n = n - probability | |
end | |
current_item | |
end | |
def get_next_state(current_state) | |
# if current_state == '0' | |
# frequency_map = { '0' => 0.47, '1' => 0.53} | |
# elsif current_state == '1' | |
# frequency_map = { '0' => 0.55, '1' => 0.45} | |
# end | |
if current_state == 'A' | |
frequency_map = {'A' => 0.35, 'B' => 0.05, 'C' => 0.60} | |
elsif current_state == 'B' | |
frequency_map = {'A' => 0.50, 'B' => 0.10, 'C' => 0.40} | |
elsif current_state == 'C' | |
frequency_map = {'A' => 0.01, 'B' => 0.47, 'C' => 0.52} | |
end | |
get_weighted_item frequency_map | |
end | |
def state_machine (start_state, end_states) | |
current = get_next_state(start_state) | |
print(current) | |
# sleep(1) | |
if end_states.include? current | |
# if next_state is an ending state, we can chose to end | |
# and if there is an outgoing edge from this next_state as well, we have to decide | |
# whether to end right now, or follow the path. Lets make it a 50-50 coin-toss for now | |
end_now = rand(0..1) | |
if end_now == 0 | |
# We stop the machine and return the current state | |
puts 'Ending...' | |
current | |
else | |
# We keep going recursively! | |
state_machine(current, end_states) | |
end | |
else | |
# we can go to any other state | |
# The chain can just keep on going... | |
state_machine(current, end_states) | |
end | |
end | |
def state_machine_iter (n, start_state, end_states) | |
# Because recursion is slooowwwwwwwwww. :( | |
n.times.each do | |
current = get_next_state(start_state) | |
print(current) | |
if end_states.include? current | |
break | |
end | |
start_state = get_next_state(current) | |
end | |
puts "\n" | |
end | |
# state_machine('0', []) | |
state_machine_iter(200, 'A', []) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment