Skip to content

Instantly share code, notes, and snippets.

@Jessime
Created November 6, 2015 14:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Jessime/2ad9b85efd3a2e875258 to your computer and use it in GitHub Desktop.
Save Jessime/2ad9b85efd3a2e875258 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Neural Networks and Deep Learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The basis for a majority of neural networks is the sigmoid neuron."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"tikz9.png\"/>"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/latex": [
"\\begin{eqnarray} \n",
" \\frac{1}{1+\\exp(-\\sum_j w_j x_j-b)}.\n",
"\\tag{4}\\end{eqnarray}"
],
"text/plain": [
"<IPython.core.display.Latex object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%latex\n",
"\\begin{eqnarray} \n",
" \\frac{1}{1+\\exp(-\\sum_j w_j x_j-b)}.\n",
"\\tag{4}\\end{eqnarray}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* x is your input vector\n",
"* w is your weight vector\n",
"* b is your bias vector"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The most important property is that it is smooth (continuously differentiable). This is a key improvement over the perceptron which is a step function which either fires or doesn't. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"sigmoid.png\"/>"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/latex": [
"\\begin{eqnarray} \n",
" \\Delta \\mbox{output} \\approx \\sum_j \\frac{\\partial \\, \\mbox{output}}{\\partial w_j}\n",
" \\Delta w_j + \\frac{\\partial \\, \\mbox{output}}{\\partial b} \\Delta b,\n",
"\\tag{5}\\end{eqnarray}"
],
"text/plain": [
"<IPython.core.display.Latex object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%latex\n",
"\\begin{eqnarray} \n",
" \\Delta \\mbox{output} \\approx \\sum_j \\frac{\\partial \\, \\mbox{output}}{\\partial w_j}\n",
" \\Delta w_j + \\frac{\\partial \\, \\mbox{output}}{\\partial b} \\Delta b,\n",
"\\tag{5}\\end{eqnarray}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A little change in the weights/biases is going to cause a little change in the output. \n",
"\n",
"It's impossible to properly learn when using a perceptron (instead of a sigmoid neuron) because at a given state you don't know how close or far you are to the step, which changes your output completely, and small changes in w/b don't make any change in the output if you're away from the step."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The architecture of neural networks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An example of a 4 layer feedforward network."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"tikz11.png\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the most difficult aspects of deep learning is properly designing the hidden layers. How many neurons should be used per layer, how many layers?\n",
"\n",
"There are some heuristics for these decisions, but the theory is way behind the emperical data in the field. Try stuff until it works."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MNIST data set"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"digits_separate.png\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is a standard dataset in ML and the one used throughout the book. Contains 60,000 handwritten digits. Each image is 28x28.\n",
"\n",
"Therefore, features for this dataset are 784 grayscale pixel values."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's an example of a shallow example that could realistically be used for classification of these digits."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"tikz12.png\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One important thing to note is that there are 10 output neurons instead of 4. (For 2<sup>4</sup>=16)\n",
"\n",
"While it's techincally possible to reduce the output neurons to 4, it's generally more difficult. This allows each output neuron to learn features specific to its class, as opposed to having to weight and compare features between classes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The cost function and gradient decent"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also known as a loss or objective function."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/latex": [
"\\begin{eqnarray} C(w,b) \\equiv\n",
" \\frac{1}{2n} \\sum_x \\| y(x) - a\\|^2.\n",
"\\tag{6}\\end{eqnarray}"
],
"text/plain": [
"<IPython.core.display.Latex object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%latex\n",
"\\begin{eqnarray} C(w,b) \\equiv\n",
" \\frac{1}{2n} \\sum_x \\| y(x) - a\\|^2.\n",
"\\tag{6}\\end{eqnarray}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Quadradic loss where:\n",
"\n",
"* C is the cost as a function of w and b\n",
"* n is the number of training samples\n",
"* y(x) is the vector of labels\n",
"* a is the activation or output from the neural net (prediction)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The cost C(w,b) becomes small, i.e., C(w,b)≈0, precisely when y(x) is approximately equal to the output, a, for all training inputs, x.\n",
"\n",
"Why do we need a cost function and how can we minimize it?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, we need a smooth function to minimize, something that our classification error rate doesn't provide. The cost function (it doesn't necessarily have to be the quadradic cost) acts as a proxy for the classification error rate.\n",
"\n",
"We can minimize the cost using gradient decent. Analytical solutions with millions or billions of paraments doesn't work."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A nice analogy for gradient decent is a ball rolling down a hill. Which way should it go to get to the bottom?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"valley_with_ball.png\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Skimming over the math:* We are going to iteratively take steps in the direction which takes us to the minimum. An important parameter in this portion is η, our step size, or in the context of neural nets, our learning rate. \n",
"\n",
"If our learning rate is too small, we are going to take very small steps and the computation will take a long time to reach the minimum. If the rate is too large, steps are going to jump around all over the place and not smoothly go to the bottom."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/latex": [
"\\begin{eqnarray}\n",
" w_k & \\rightarrow & w_k' = w_k-\\eta \\frac{\\partial C}{\\partial w_k} \\tag{16}\\\\\n",
" b_l & \\rightarrow & b_l' = b_l-\\eta \\frac{\\partial C}{\\partial b_l}.\n",
"\\tag{17}\\end{eqnarray}"
],
"text/plain": [
"<IPython.core.display.Latex object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%latex\n",
"\\begin{eqnarray}\n",
" w_k & \\rightarrow & w_k' = w_k-\\eta \\frac{\\partial C}{\\partial w_k} \\tag{16}\\\\\n",
" b_l & \\rightarrow & b_l' = b_l-\\eta \\frac{\\partial C}{\\partial b_l}.\n",
"\\tag{17}\\end{eqnarray}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To take a step from our current position, we are going to subtract the partial of C with respect to w/b times our learning rate from the current position/weight."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Key problem: To do this requires you to take the partials for each of your samples, which is quite expensive. We'll only look at a few of the samples (randomly selected) as an approximation. This is *stochastic gradient descent* and the sub-samples of the instances are known as mini-batchs. The size of each mini-batch is another hyper-parameter. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A third hyper-parameter to choose is the number of epochs, which is simply the number of times you want to go through all your training samples. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hyper parameters so far:\n",
"\n",
"1. Epochs = Number of times to iterate through training samples\n",
"2. Mini batch size = The number of samples to look at for each step of SGD\n",
"3. learning rate = The step size multiplier for the gradient update."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example Run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
">>> import mnist_loader\n",
">>> training_data, validation_data, test_data = mnist_loader.load_data_wrapper()\n",
">>>\n",
">>> import network\n",
">>> net = network.Network([784, 30, 10])\n",
">>> net.SGD(training_data, 30, 10, 3.0, test_data=test_data)\n",
"...\n",
"Epoch 27: 9528 / 10000\n",
"Epoch 28: 9542 / 10000\n",
"Epoch 29: 9534 / 10000\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wrapping all the concepts into code, what's going on behind the scenes in the SGD?"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#for each epoch:\n",
" #create mini-batches from training data\n",
" \n",
" #for each mini-batch:\n",
" #Update the network's w/b by applying gradient descent by backpropagration\n",
" \n",
" #Compare the network output on test data to check classification rate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Backpropagation, the Workhorse of Learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment