Skip to content

Instantly share code, notes, and snippets.

@praveen-palanisamy
Created December 12, 2015 22:49
Show Gist options
  • Save praveen-palanisamy/3802cec2b8ad67fd667f to your computer and use it in GitHub Desktop.
Save praveen-palanisamy/3802cec2b8ad67fd667f to your computer and use it in GitHub Desktop.
Weight update step for reward/loss based learning under bandit settings
lossScalar = 1 - reward; % This is loss of the chosen action
lossVector = zeros(1,self.nbActions);
lossVector(astAction) = lossScalar;
self.timeStep=self.timeStep+1;
%The weight update step below depends on the learning policy. This will probably be handled by the NN/RL-net
self.weights=self.weights.*(exp(-sqrt(log(self.numActions)/self.timeStep)*lossVector))';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment