Skip to content

Instantly share code, notes, and snippets.

@Danaze
Created August 25, 2020 16:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Danaze/63771688e1528ee49566f7b3f6af4db0 to your computer and use it in GitHub Desktop.
Save Danaze/63771688e1528ee49566f7b3f6af4db0 to your computer and use it in GitHub Desktop.
def QLtrain(self):
cum_reward = np.zeros((self.num_episodes))
for ep in range(self.num_episodes):
current_state = self.discretize_state(self.env.reset())
done = False
while not done:
#choosing action according to our exploration-exploitation policy
action = self.choose_action(current_state)
obs, reward, done, _ = self.env.step(action)
cum_reward[ep]+=reward
new_state = self.discretize_state(obs)
self.QLupdate(current_state, action, reward, new_state)
current_state = new_state
self.getEpsilon()
self.getLR()
return cum_reward
print('QL based training is finished!')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment