Skip to content

Instantly share code, notes, and snippets.

Last active October 27, 2021 19:37
What would you like to do?
--cb_dro demo for vowpal wabbit using covertype. To see the lift, note the "since last acc" column with and without --cb-dro.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Copy link

pmineiro commented Dec 5, 2020

In this gist I:

  • pre-train a logging policy using 10% of covertype, and then fix the logging policy thereafter
  • off-policy train another policy using data from the logging policy, either with or without the --cb_dro flag
  • --cb_dro improves the trained policy from 71.8% to 73.3% accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment