Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Q-Table learning in OpenAI grid world.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@OscarBalcells

This comment has been minimized.

Copy link

commented Sep 29, 2018

Hello Juliani, thanks for the nice post in Medium. I know this code is already very old, but I still wanted to ask you a question anyways. When you update the QValue of the state you took the action in Q[s,a] = Q[s,a] + lr*( r + y*np.max(Q[s1,:1]) - Q[s,a] ) you are in theory multiplying gamma by the expected future rewards after you've taken action a, however in the code you multiply gamma by the biggest value in the next state's q values np.max(Q[s1,:]). Am I understanding something wrong about "plus the maximum discounted (γ) future reward expected according to our own table for the next state (s’) we would end up in" or is there a mistake in the code? (I'm probably wrong haha)

@alexandervandekleut

This comment has been minimized.

Copy link

commented Jan 4, 2019

Hey! I was trying to figure out why my implementation of this wasn't working and I found out that this code only works if you add noise. Even epsilon-greedy approaches fail to get any reward. Removing + np.random.randn(1,env.action_space.n)*(1./(i+1))) results in 0 reward. I understand the importance of visiting as many s, a pairs as possible, but it seems strange to me that this process working at all depends heavily on noise.

@tykurtz

This comment has been minimized.

Copy link

commented Jan 14, 2019

@alexandervandekleut

It makes sense that the randomness is necessary. If there's no randomness, then a = np.argmax(Q[s,:]) always returns 0 (or move left) as Q is initialized with all zeros in this setup. Since the reward is only ever given if the goal is reached and not from intermediate goals, there will never be any feedback to update Q unless at some point the agent reaches the goal. This isn't possible if it never tries to move right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.