Skip to content

Instantly share code, notes, and snippets.

@Saren-Arterius
Last active April 22, 2019 08:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Saren-Arterius/c7788b650fd24935b2cf5b47c887a4f6 to your computer and use it in GitHub Desktop.
Save Saren-Arterius/c7788b650fd24935b2cf5b47c887a4f6 to your computer and use it in GitHub Desktop.
Big Data Assignment 2

Links

Question

PDF Version of this solution

Solution

The variables and functions are defined as follows:

import numpy as np
x = np.array([0.9, 0.1, -1])
y = np.array([0, 1])
lr = 0.01
relu = lambda x: max(x, 0)

The weights $w_1, w_2$ are initialized as follows:

w_1 = np.array([
    [0.3, 0.9, 1, 0.4],
    [0.6, 0.8, -0.3, -0.6],
    [-1.0, 0.1, -0.4, -0.2]
])
w_2 = np.array([
    [0.3, 0.8, 0.2, 0],
    [-0.1, 0.0, -0.6, 0.1]
])

Forward propagation

First layer

The first layer's input would be $ip_2$. Multiply $w_1 \times ip_2$, sum and apply activation function (ReLU) to each row:

ip_2 = np.insert(x, 0, [1])
a_2 = np.array([relu(sum(row)) for row in np.multiply(w_1, ip_2)])

Therefore, values of $a^\text{(2)}_1, a^\text{(2)}_2, a^\text{(2)}_3$ would be:

print(a_2)
[0.81 1.89 0.  ]

Second layer

The second layer's input would be $ip_3$. Multiply $w_2 \times ip_3$, sum and apply activation function (ReLU) to each row:

ip_3 = np.insert(a_2, 0, [1])
a_3 = np.array([relu(sum(row)) for row in np.multiply(w_2, ip_3)])

Therefore, values of $a^\text{(3)}_1, a^\text{(3)}_2$ would be:

print(a_3)
[1.326 0.   ]

Lost function

For the lost function $J(\theta)$:

j = lambda a3_y: 0.5 * ((a3_y[0] - a3_y[1]) ** 2)
j_theta = sum(map(j, np.column_stack((a_3, y))))
print(j_theta)
1.3791380000000004

Backpropagation

Calculate gradients with respect to $w_2$ which is from the second layer. $E_\text{total}$ would be same as $J(\theta)$:

sum_b = lambda i: sum(np.multiply(w_2[i], ip_3))

d_et_a3i = lambda i: a_3[i] - y[i]
d_a3i_sumb = lambda i: 1 if sum_b(i) >= 0 else 0
d_sumb_w2ij = lambda i, j: ip_3[j]

d_et_w2ij = lambda i, j: d_et_a3i(i) * d_a3i_sumb(i) * d_sumb_w2ij(i, j)

For $w_2[0]$ case, the gradients are:

arr_0 = np.array([d_et_w2ij(0, j) for j in range(ip_3.shape[0])])
print(arr_0)
[1.326   1.07406 2.50614 0.     ]

Which the new weights for $w_2[0]$ will be:

nw_2_0 = np.array([w_2[0][j] - (lr * arr_0[j]) for j in range(arr_0.shape[0])])
print(nw_2_0)
[0.28674   0.7892594 0.1749386 0.       ]

For $w_2[1]$ case, the gradients are:

arr_1 = np.array([d_et_w2ij(1, j) for j in range(ip_3.shape[0])])
print(arr_1)
[-0. -0. -0. -0.]

Which the new weights for $w_2[1]$ will be:

nw_2_1 = np.array([w_2[1][j] - (lr * arr_1[j]) for j in range(arr_1.shape[0])])
print(nw_2_1)
[-0.1  0.  -0.6  0.1]

Calculate gradients with respect to $w_1$ which is from the first layer. $E_\text{total} = E_\text{out1} + E_\text{out2}$:

sum_a = lambda i: sum(np.multiply(w_1[i], ip_2))

d_eo1_a2j = lambda j: a_3[0] * 1 * w_2[0][j]
d_eo2_a2j = lambda j: a_3[1] * 1 * w_2[1][j]
d_et_a2j = lambda j: d_eo1_a2j(j) + d_eo2_a2j(j)
d_a2i_suma = lambda i: 1 if sum_a(i) >= 0 else 0
d_suma_w1ij = lambda i, j: ip_2[j]

d_et_w1ij = lambda i, j: d_et_a2j(j) * d_a2i_suma(i) * d_suma_w1ij(i, j)

For $w_1[0]$ case, the gradients are:

arr_0 = np.array([d_et_w1ij(0, j) for j in range(ip_2.shape[0])])
print(arr_0)
[ 0.3978   0.95472  0.02652 -0.     ]

Which the new weights for $w_1[0]$ will be:

nw_1_0 = np.array([w_1[0][j] - (lr * arr_0[j]) for j in range(arr_0.shape[0])])
print(nw_1_0)
[0.296022  0.8904528 0.9997348 0.4      ]

For $w_1[1]$ case, the gradients are:

arr_1 = np.array([d_et_w1ij(1, j) for j in range(ip_2.shape[0])])
print(arr_1)
[ 0.3978   0.95472  0.02652 -0.     ]

Which the new weights for $w_1[1]$ will be:

nw_1_1 = np.array([w_1[1][j] - (lr * arr_1[j]) for j in range(arr_1.shape[0])])
print(nw_1_1)
[ 0.596022   0.7904528 -0.3002652 -0.6      ]

For $w_1[2]$ case, the gradients are:

arr_2 = np.array([d_et_w1ij(2, j) for j in range(ip_2.shape[0])])
print(arr_2)
[ 0.  0.  0. -0.]

Which the new weights for $w_1[2]$ will be:

nw_1_2 = np.array([w_1[2][j] - (lr * arr_2[j]) for j in range(arr_2.shape[0])])
print(nw_1_2)
[-1.   0.1 -0.4 -0.2]

Therefore, $w_1$ ($\theta^\text{(1)}$) after first iteration will be:

print(np.array([nw_1_0, nw_1_1, nw_1_2]))
[[ 0.296022   0.8904528  0.9997348  0.4      ]
 [ 0.596022   0.7904528 -0.3002652 -0.6      ]
 [-1.         0.1       -0.4       -0.2      ]]

And $w_2$ ($\theta^\text{(2)}$) after first iteration will be:

print(np.array([nw_2_0, nw_2_1]))
[[ 0.28674    0.7892594  0.1749386  0.       ]
 [-0.1        0.        -0.6        0.1      ]]
# %% [markdown]
#The variables and functions are defined as follows:
# %%
import numpy as np
x = np.array([0.9, 0.1, -1])
y = np.array([0, 1])
lr = 0.01
relu = lambda x: max(x, 0)
# %% [markdown]
#The weights $w_1, w_2$ are initialized as follows:
# %%
w_1 = np.array([
[0.3, 0.9, 1, 0.4],
[0.6, 0.8, -0.3, -0.6],
[-1.0, 0.1, -0.4, -0.2]
])
w_2 = np.array([
[0.3, 0.8, 0.2, 0],
[-0.1, 0.0, -0.6, 0.1]
])
# %% [markdown]
## Forward propagation
#
### First layer
#The first layer's input would be $ip_2$. Multiply $w_1 \times ip_2$, sum and apply activation function (ReLU) to each row:
# %%
ip_2 = np.insert(x, 0, [1])
a_2 = np.array([relu(sum(row)) for row in np.multiply(w_1, ip_2)])
# %% [markdown]
#Therefore, values of $a^\text{(2)}_1, a^\text{(2)}_2, a^\text{(2)}_3$ would be:
# %%
print(a_2)
# %% [markdown]
### Second layer
#
#The second layer's input would be $ip_3$. Multiply $w_2 \times ip_3$, sum and apply activation function (ReLU) to each row:
# %%
ip_3 = np.insert(a_2, 0, [1])
a_3 = np.array([relu(sum(row)) for row in np.multiply(w_2, ip_3)])
# %% [markdown]
#Therefore, values of $a^\text{(3)}_1, a^\text{(3)}_2$ would be:
# %%
print(a_3)
# %% [markdown]
### Lost function
#
#For the lost function $J(\theta)$:
# %%
j = lambda a3_y: 0.5 * ((a3_y[0] - a3_y[1]) ** 2)
j_theta = sum(map(j, np.column_stack((a_3, y))))
print(j_theta)
# %% [markdown]
## Backpropagation
#
### Calculate gradients with respect to $w_2$ which is from the second layer. $E_\text{total}$ would be same as $J(\theta)$:
# %%
sum_b = lambda i: sum(np.multiply(w_2[i], ip_3))
d_et_a3i = lambda i: a_3[i] - y[i]
d_a3i_sumb = lambda i: 1 if sum_b(i) >= 0 else 0
d_sumb_w2ij = lambda i, j: ip_3[j]
d_et_w2ij = lambda i, j: d_et_a3i(i) * d_a3i_sumb(i) * d_sumb_w2ij(i, j)
# %% [markdown]
#For $w_2[0]$ case, the gradients are:
# %%
arr_0 = np.array([d_et_w2ij(0, j) for j in range(ip_3.shape[0])])
print(arr_0)
# %% [markdown]
#Which the new weights for $w_2[0]$ will be:
# %%
nw_2_0 = np.array([w_2[0][j] - (lr * arr_0[j]) for j in range(arr_0.shape[0])])
print(nw_2_0)
# %% [markdown]
#For $w_2[1]$ case, the gradients are:
# %%
arr_1 = np.array([d_et_w2ij(1, j) for j in range(ip_3.shape[0])])
print(arr_1)
# %% [markdown]
#Which the new weights for $w_2[1]$ will be:
# %%
nw_2_1 = np.array([w_2[1][j] - (lr * arr_1[j]) for j in range(arr_1.shape[0])])
print(nw_2_1)
# %% [markdown]
### Calculate gradients with respect to $w_1$ which is from the first layer. $E_\text{total} = E_\text{out1} + E_\text{out2}$:
# %%
sum_a = lambda i: sum(np.multiply(w_1[i], ip_2))
d_eo1_a2j = lambda j: a_3[0] * 1 * w_2[0][j]
d_eo2_a2j = lambda j: a_3[1] * 1 * w_2[1][j]
d_et_a2j = lambda j: d_eo1_a2j(j) + d_eo2_a2j(j)
d_a2i_suma = lambda i: 1 if sum_a(i) >= 0 else 0
d_suma_w1ij = lambda i, j: ip_2[j]
d_et_w1ij = lambda i, j: d_et_a2j(j) * d_a2i_suma(i) * d_suma_w1ij(i, j)
# %% [markdown]
#For $w_1[0]$ case, the gradients are:
# %%
arr_0 = np.array([d_et_w1ij(0, j) for j in range(ip_2.shape[0])])
print(arr_0)
# %% [markdown]
#Which the new weights for $w_1[0]$ will be:
# %%
nw_1_0 = np.array([w_1[0][j] - (lr * arr_0[j]) for j in range(arr_0.shape[0])])
print(nw_1_0)
# %% [markdown]
#For $w_1[1]$ case, the gradients are:
# %%
arr_1 = np.array([d_et_w1ij(1, j) for j in range(ip_2.shape[0])])
print(arr_1)
# %% [markdown]
#Which the new weights for $w_1[1]$ will be:
# %%
nw_1_1 = np.array([w_1[1][j] - (lr * arr_1[j]) for j in range(arr_1.shape[0])])
print(nw_1_1)
# %% [markdown]
#For $w_1[2]$ case, the gradients are:
# %%
arr_2 = np.array([d_et_w1ij(2, j) for j in range(ip_2.shape[0])])
print(arr_2)
# %% [markdown]
#Which the new weights for $w_1[2]$ will be:
# %%
nw_1_2 = np.array([w_1[2][j] - (lr * arr_2[j]) for j in range(arr_2.shape[0])])
print(nw_1_2)
# %% [markdown]
#Therefore, $w_1$ ($\theta^\text{(1)}$) after first iteration will be:
# %%
print(np.array([nw_1_0, nw_1_1, nw_1_2]))
# %% [markdown]
#And $w_2$ ($\theta^\text{(2)}$) after first iteration will be:
# %%
print(np.array([nw_2_0, nw_2_1]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment