Skip to content

Instantly share code, notes, and snippets.

@se4u
Created December 22, 2019 22:39
Show Gist options
  • Save se4u/dc6e500f3856b558515d9194e3a0b350 to your computer and use it in GitHub Desktop.
Save se4u/dc6e500f3856b558515d9194e3a0b350 to your computer and use it in GitHub Desktop.
Simulate effective sample size computation to check the example given in "Lessors from Contextual Bandit Learning in a Customer Support Bot" by Karampatziakis et al.
import numpy as np
k = 20
eps = 0.1
eps2 = 0.5
w = []
n = 100000
for i in range(n):
rule = np.random.choice(k)
eps_greedy_action = rule if np.random.rand() > eps else np.random.choice(k)
mu = (1 - eps + eps/k) if rule == eps_greedy_action else eps/k
pi_action = eps_greedy_action if np.random.rand() > eps2 else np.random.choice(k)
pi = (1 - eps2 + eps2/k) if pi_action == eps_greedy_action else eps2/k
w.append(pi/mu)
w = np.array(w)
print((w.sum() ** 2) / (w * w).sum() / n)
# 0.0594
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment