Skip to content

Instantly share code, notes, and snippets.

@pmineiro
Last active October 27, 2021 19:37
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pmineiro/390d6cc820c628d04dea991f8018c054 to your computer and use it in GitHub Desktop.
Save pmineiro/390d6cc820c628d04dea991f8018c054 to your computer and use it in GitHub Desktop.
--cb_dro demo for vowpal wabbit using covertype. To see the lift, note the "since last acc" column with and without --cb-dro.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@pmineiro
Copy link
Author

pmineiro commented Dec 5, 2020

In this gist I:

  • pre-train a logging policy using 10% of covertype, and then fix the logging policy thereafter
  • off-policy train another policy using data from the logging policy, either with or without the --cb_dro flag
  • --cb_dro improves the trained policy from 71.8% to 73.3% accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment