Skip to content

Instantly share code, notes, and snippets.

@pmineiro
Last active June 9, 2022 16:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pmineiro/5863eb0ba0b1f6963447f8f500bf0f1c to your computer and use it in GitHub Desktop.
Save pmineiro/5863eb0ba0b1f6963447f8f500bf0f1c to your computer and use it in GitHub Desktop.
The latest in OPE-CS. This can track the running mean of a predictable policy sequence in a nonstationary environment and does not require an explicit importance weight upper bound. For a fixed policy in a stationary environment the running intersection can be used to shrink the interval monotonically.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@pmineiro
Copy link
Author

Convergence is more rapid when evaluating policies nearer to the logging policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment