Skip to content

Instantly share code, notes, and snippets.

@travisbrady
Last active November 20, 2020 16:30
Show Gist options
  • Save travisbrady/546b40a0ed328d8f83dbc6230ee83fec to your computer and use it in GitHub Desktop.
Save travisbrady/546b40a0ed328d8f83dbc6230ee83fec to your computer and use it in GitHub Desktop.
Check differences in results given different cb_types

Testing cb_type parameter

Summary

This is a very simple test of vowpal wabbit's contextual bandits using the dataset from Tony Jebara's ML for Personalization class at Columbia. Specifically I wanted to see what varying the cb_type param would do.

The results here are surprising (to me) in that the take rate varies so much by cb_type. Is that expected behavior?

Script output

$ python vw_cb_type_test.py
============================================================
cb_type = dr
total_rows = 10000
cumulative_reward = 816.0
same_count = 1026
take_rate = 0.7953216374269005

============================================================
cb_type = dm
total_rows = 10000
cumulative_reward = 247.0
same_count = 1059
take_rate = 0.2332389046270066

============================================================
cb_type = ips
total_rows = 10000
cumulative_reward = 878.0
same_count = 1034
take_rate = 0.8491295938104448

Environment

vw 8.9.0 installed yesterday 2020.11.19 via pip On a 5 year old macbook pro

$ uname -a
Darwin AUSC02QRF6FG8WP 19.4.0 Darwin Kernel Version 19.4.0: Wed Mar  4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64 x86_64
from vowpalwabbit import pyvw
def run_loop(cb_type):
fn = 'dataset.txt'
vw = pyvw.vw('--cb 10 --cb_type {}'.format(cb_type), quiet=True)
total_rows, same_count = 0, 0
cumulative_reward = 0.0
for row in open(fn):
total_rows += 1
chunks = row.split()
action, reward = int(chunks[0]), float(chunks[1])
context = ' '.join(['f{}:{}.0'.format(i+1, x) for i, x in enumerate(chunks[2:])])
cost = 1 - reward
train_string = '{}:{:.4f}:{} | {}'.format(action, cost, '0.1', context)
test_string = '| {}'.format(context)
pred = vw.predict(test_string)
if pred == action:
same_count += 1
vw.learn(train_string)
cumulative_reward += reward
print('='*60)
print('cb_type = {}'.format(cb_type))
print(f"total_rows = {total_rows}")
print(f"cumulative_reward = {cumulative_reward}")
print(f"same_count = {same_count}")
print('take_rate = {}'.format((cumulative_reward/same_count)))
print()
def main():
for cb_type in ('dr', 'dm', 'ips'):
run_loop(cb_type)
if __name__ == '__main__':
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment