Skip to content

Instantly share code, notes, and snippets.



Last active Dec 7, 2020
What would you like to do?
--cb_dro demo for vowpal wabbit using covertype. To see the lift, note the "since last acc" column with and without --cb-dro.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

This comment has been minimized.

Copy link
Owner Author

@pmineiro pmineiro commented Dec 5, 2020

In this gist I:

  • pre-train a logging policy using 10% of covertype, and then fix the logging policy thereafter
  • off-policy train another policy using data from the logging policy, either with or without the --cb_dro flag
  • --cb_dro improves the trained policy from 71.8% to 73.3% accuracy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment