travisbrady/results.md

## results.md

      
    Raw
  

              results.md
            
          
    Testing cb_type parameter

Summary

This is a very simple test of vowpal wabbit's contextual bandits using the dataset from Tony Jebara's ML for Personalization class at Columbia. Specifically I wanted to see what varying the cb_type param would do.

Instructions: http://www.cs.columbia.edu/~jebara/6998/hw2.pdf
Dataset link: http://www.cs.columbia.edu/~jebara/6998/dataset.txt

The results here are surprising (to me) in that the take rate varies so much by cb_type. Is that expected behavior?
Script output

$ python vw_cb_type_test.py
============================================================
cb_type = dr
total_rows = 10000
cumulative_reward = 816.0
same_count = 1026
take_rate = 0.7953216374269005

============================================================
cb_type = dm
total_rows = 10000
cumulative_reward = 247.0
same_count = 1059
take_rate = 0.2332389046270066

============================================================
cb_type = ips
total_rows = 10000
cumulative_reward = 878.0
same_count = 1034
take_rate = 0.8491295938104448

Environment

vw 8.9.0 installed yesterday 2020.11.19 via pip
On a 5 year old macbook pro
$ uname -a
Darwin AUSC02QRF6FG8WP 19.4.0 Darwin Kernel Version 19.4.0: Wed Mar  4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64 x86_64


## vw_cb_type_test.py
from vowpalwabbit import pyvw


def run_loop(cb_type):
    fn = 'dataset.txt'
    vw = pyvw.vw('--cb 10 --cb_type {}'.format(cb_type), quiet=True)
    total_rows, same_count = 0, 0
    cumulative_reward = 0.0
    for row in open(fn):
        total_rows += 1
        chunks = row.split()
        action, reward = int(chunks[0]), float(chunks[1])
        context = ' '.join(['f{}:{}.0'.format(i+1, x) for i, x in enumerate(chunks[2:])])
        cost = 1 - reward
        train_string = '{}:{:.4f}:{} | {}'.format(action, cost, '0.1', context)
        test_string = '| {}'.format(context)
        pred = vw.predict(test_string)
        if pred == action:
            same_count += 1
            vw.learn(train_string)
            cumulative_reward += reward

    print('='*60)
    print('cb_type = {}'.format(cb_type))
    print(f"total_rows = {total_rows}")
    print(f"cumulative_reward = {cumulative_reward}")
    print(f"same_count = {same_count}")
    print('take_rate = {}'.format((cumulative_reward/same_count)))
    print()

def main():
    for cb_type in ('dr', 'dm', 'ips'):
        run_loop(cb_type)

if __name__ == '__main__':
    main()
	from vowpalwabbit import pyvw


	def run_loop(cb_type):
	fn = 'dataset.txt'
	vw = pyvw.vw('--cb 10 --cb_type {}'.format(cb_type), quiet=True)
	total_rows, same_count = 0, 0
	cumulative_reward = 0.0
	for row in open(fn):
	total_rows += 1
	chunks = row.split()
	action, reward = int(chunks[0]), float(chunks[1])
	context = ' '.join(['f{}:{}.0'.format(i+1, x) for i, x in enumerate(chunks[2:])])
	cost = 1 - reward
	train_string = '{}:{:.4f}:{} \| {}'.format(action, cost, '0.1', context)
	test_string = '\| {}'.format(context)
	pred = vw.predict(test_string)
	if pred == action:
	same_count += 1
	vw.learn(train_string)
	cumulative_reward += reward

	print('='*60)
	print('cb_type = {}'.format(cb_type))
	print(f"total_rows = {total_rows}")
	print(f"cumulative_reward = {cumulative_reward}")
	print(f"same_count = {same_count}")
	print('take_rate = {}'.format((cumulative_reward/same_count)))
	print()

	def main():
	for cb_type in ('dr', 'dm', 'ips'):
	run_loop(cb_type)

	if __name__ == '__main__':
	main()