Skip to content

Instantly share code, notes, and snippets.

@simoninithomas
Last active May 9, 2023 06:15
Show Gist options
  • Save simoninithomas/baafe42d1a665fb297ca669aa2fa6f92 to your computer and use it in GitHub Desktop.
Save simoninithomas/baafe42d1a665fb297ca669aa2fa6f92 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@harshilpatel312
Copy link

For some runs, the value of the qtable does not change (outputs all zeros after Step 4.). I tried fixing the seed and still get different qtables at the end. Could you tell me why this would be the case?

Btw, awesome work on the reinforcement learning articles!

@anubhavshrimal
Copy link

For some runs, the value of the qtable does not change (outputs all zeros after Step 4.). I tried fixing the seed and still get different qtables at the end. Could you tell me why this would be the case?

In step 4:

Remove:

step = 0
done = False

And add:
action = None
after
exp_exp_tradeoff = random.uniform(0, 1)

@rmihir96
Copy link

---> 30 qtable[state, action] = qtable[state, action] + learning_rate * (reward + gamma * np.max(qtable[new_state, :]) - qtable[state, action])
31
32

IndexError: arrays used as indices must be of integer (or boolean) type

Any idea why this is happening?

@anuyash49
Copy link

my code is exactly same but I am getting total 143 rewards in 10000(ten thousand) episode. very low accuracy

@simoninithomas
Copy link
Author

Hey there this code is obsolete check this instead: https://huggingface.co/learn/deep-rl-course/unit2/introduction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment