Skip to content

Instantly share code, notes, and snippets.

View bushaev's full-sized avatar

Vitaly Bushaev bushaev

  • ITMO University
  • Saint Petersburg
View GitHub Profile
@bushaev
bushaev / Adamax.py
Last active October 22, 2018 12:49
for t in range(num_iterations):
g = compute_gradient(x, y)
m = beta_1 * m + (1 - beta_1) * g
m_hat = m / (1 - np.power(beta_1, t))
v = np.maximum(beta_2 * v, np.abs(g))
w = w - step_size * m / v
@bushaev
bushaev / Adamax.py
Last active October 22, 2018 12:49
for t in range(num_iterations):
g = compute_gradient(x, y)
m = beta_1 * m + (1 - beta_1) * g
m_hat = m / (1 - np.power(beta_1, t))
v = np.maximum(beta_2 * v, np.abs(g))
w = w - step_size * m_hat / v
@bushaev
bushaev / Nadam.py
Last active October 22, 2018 12:49
for t in range(num_iterations):
g = compute_gradient(x, y)
m = beta_1 * m + (1 - beta_1) * g
v = beta_2 * v + (1 - beta_2) * np.power(g, 2)
m_hat = m / (1 - np.power(beta_1, t)) + (1 - beta_1) * g / (1 - np.power(beta_1, t))
v_hat = v / (1 - np.power(beta_2, t))
w = w - step_size * m_hat / (np.sqrt(v_hat) + epsilon)
for t in range(num_iterations):
g = compute_gradient(x, y)
m = beta_1 * m + (1 - beta_1) * g
v = beta_2 * v + (1 - beta_2) * np.power(g, 2)
v_hat = np.maximum(v, v_hat)
w = w - step_size * m / (np.sqrt(v_hat) + epsilon)