Instantly share code, notes, and snippets.

# karpathy/min-char-rnn.py

Last active August 7, 2024 08:42
Show Gist options
• Save karpathy/d4dee566867f8291f086 to your computer and use it in GitHub Desktop.
Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

### pursuemoon commented Oct 27, 2023

I have a problem in the code of line 58: `dhnext = np.dot(Whh.T, dhraw)`. Could anyone tell me what it means?
The expression for forward propagation is:
(1)zt=Whh⋅ht−1+Wxh⋅xt+bh(2)ht=Tanh(zt)(3)yt=Why⋅ht+by(4)ot=Softmax(yt)
Here is the gradients expression of weights that I derived:
(a)∂L∂Whh=∂L∂yt⊗htT(b)∂L∂Whh=WhyT⋅∂L∂yt⊙Tanh′(zt)⊗ht−1T(c)∂L∂Wxh=WhyT⋅∂L∂yt⊙Tanh′(zt)⊗xtT
It is easy to see that the red part is what `dhraw` represents in the code. And we can get `dWxh` and `dWhh` from formula (b) and formula (c) without `dhnext`. So what does `dhnext` mean?

Since dh means the gradient of loss wrt the hidden state, there are two ways the gradients flow(back propogation). From the equation and rnn structure we learn that the hidden state feeds forward to a output and next hidden state. The dhnext represents the gradient update for current state from the next hidden state. Note that dhnext is zero at first iteration because for the last-layer (unrolled) of rnn , there 's not a gradient update flow from the next hidden state. Hope this will help you .

Thank you so much for your replying. I missed the partial derivative wrt the next hidden state.

### kefei-cs19 commented Dec 11, 2023

@logeek404

However , I find that the mem and param are all local variables in the loop. Don't know if the implementation of adaGrad is correct.

mem and param point to the numpy ndarrays from the zip, and += updates their values in-place.