Skip to content

Instantly share code, notes, and snippets.

@tokoroten
Created October 12, 2014 15:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tokoroten/8b745615a0acec573fcb to your computer and use it in GitHub Desktop.
Save tokoroten/8b745615a0acec573fcb to your computer and use it in GitHub Desktop.
ε-Greedy base multi-armed bandit
#coding:utf-8
# ε-Greedy base multi-armed bandit
import random
slotmachine_rate = [3, 5, 10, 20, 16, 15, 21, 22, 6]
score_map = [[0.0, 0] for i in xrange(len(slotmachine_rate))]
search_rate = 0.1
def try_slot(i):
return random.random() * slotmachine_rate[i]
# first_try
for i in xrange(len(slotmachine_rate)):
score = try_slot(i)
score_map[i][0] += score
score_map[i][1] += 1
def get_most_good_slot():
score = -1
ret = -1
for i in xrange(len(slotmachine_rate)):
t_score = score_map[i][0] / score_map[i][1]
if score < t_score:
score = t_score
ret = i
return ret
# 試行
total_score = 0
for i in xrange(1000):
target_slot = get_most_good_slot()
if random.random() < search_rate:
target_slot = random.randrange(0, len(slotmachine_rate))
score = try_slot(target_slot)
score_map[target_slot][0] += score
score_map[target_slot][1] += 1
total_score += score
print total_score / 1000 # 期待値
print score_map
print [a[0]/a[1] for a in score_map]
@tokoroten
Copy link
Author

ε-greedyによる強化学習の実装、マルチアームバンディットアルゴリズム

10.4599702786
[[10.582463343707063, 7], [25.134890913305536, 8], [51.183655244464155, 9], [121.497503175949, 13], [102.19258069525073, 12], [84.93549150800752, 10], [56.60589599103875, 8], [10016.860569443725, 929], [41.67253281857592, 13]]
[1.5117804776724377, 3.141861364163192, 5.6870728049404615, 9.345961782765308, 8.516048391270894, 8.493549150800751, 7.075736998879844, 10.782411807797336, 3.205579447582763]

期待値は22/2=11なので、まぁ、良好。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment