-
-
Save simoninithomas/baafe42d1a665fb297ca669aa2fa6f92 to your computer and use it in GitHub Desktop.
For some runs, the value of the qtable does not change (outputs all zeros after Step 4.). I tried fixing the seed and still get different qtables at the end. Could you tell me why this would be the case?
In step 4:
Remove:
step = 0
done = False
And add:
action = None
after
exp_exp_tradeoff = random.uniform(0, 1)
---> 30 qtable[state, action] = qtable[state, action] + learning_rate * (reward + gamma * np.max(qtable[new_state, :]) - qtable[state, action])
31
32
IndexError: arrays used as indices must be of integer (or boolean) type
Any idea why this is happening?
my code is exactly same but I am getting total 143 rewards in 10000(ten thousand) episode. very low accuracy
Hey there this code is obsolete check this instead: https://huggingface.co/learn/deep-rl-course/unit2/introduction
For some runs, the value of the qtable does not change (outputs all zeros after Step 4.). I tried fixing the seed and still get different qtables at the end. Could you tell me why this would be the case?
Btw, awesome work on the reinforcement learning articles!