Skip to content

Instantly share code, notes, and snippets.

@rain-1
Last active March 25, 2023 03:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rain-1/5f3efd6bf49f4b0d1a5a466d3ecde568 to your computer and use it in GitHub Desktop.
Save rain-1/5f3efd6bf49f4b0d1a5a466d3ecde568 to your computer and use it in GitHub Desktop.
MNIST digit classification

MNIST digit recognition

The pytorch (neural network library) examples include a script to try out the training process for MNIST digit recognition data set: https://github.com/pytorch/examples/tree/main/mnist

This builds up a convolutional neural network that takes one of these pictures and processes it down to 10 neurons. The training process uses two sets of labelled data (examples of pictures of digits and which of the 10 possible digits they are): One training set and one testing set. The training set is used to manipulate all of the "weights" inside the neural network by moving in the (very high dimensional) direction of fastest descent, aiming to get the output neurons to produce the intended label given the input picture. The testing set is used as a metric to say how well the neural network is doing.

I ran this, creating mnist_cnn.pt with 99% accuracy on the test data set.

Then I wanted to see if it worked, so I drew images of all 10 digits. There was no way to try this out so I wrote the attached script (try.py). I also wrote a script to display the results as dots attached to the images I drew. Here are the results that I got:

set1

These results are terrible. It isn't able to correctly classify the digits I drew. I thought that maybe I drew them with too thick a paintbrush or something, so I looked at the actual MNIST digits and tried to do ones that looked similar to it.

set2

Again, awful results. Then I realized that the data set may be inverted in this particular setup. So I tried inverting my images . This gave good results:

set3

Here are the actual outputs of the NN:

tensor([[   91.81,     0.00,     8.13,     0.05,     0.00,     0.00,     0.00,
             0.00,     0.01,     0.00]], grad_fn=<MulBackward0>)
tensor([[ 0.13, 59.88,  1.23,  0.15,  7.22,  0.94, 20.30,  0.08,  9.98,  0.08]],
       grad_fn=<MulBackward0>)
tensor([[    0.00,     0.02,    99.98,     0.00,     0.00,     0.00,     0.00,
             0.00,     0.00,     0.00]], grad_fn=<MulBackward0>)
tensor([[    0.00,     0.00,     0.00,   100.00,     0.00,     0.00,     0.00,
             0.00,     0.00,     0.00]], grad_fn=<MulBackward0>)
tensor([[    0.00,     0.00,     0.00,     0.00,   100.00,     0.00,     0.00,
             0.00,     0.00,     0.00]], grad_fn=<MulBackward0>)
tensor([[    0.00,     0.00,     0.00,     0.50,     0.00,    99.49,     0.00,
             0.02,     0.00,     0.00]], grad_fn=<MulBackward0>)
tensor([[    0.00,     0.00,     0.00,     0.00,     0.00,     0.00,   100.00,
             0.00,     0.00,     0.00]], grad_fn=<MulBackward0>)
tensor([[    0.00,     3.67,     0.02,     0.00,     0.00,     0.00,     0.00,
            96.31,     0.00,     0.00]], grad_fn=<MulBackward0>)
tensor([[    1.52,     0.00,     0.02,     0.04,     0.00,     0.00,     0.00,
             0.00,    98.42,     0.00]], grad_fn=<MulBackward0>)
tensor([[    0.00,     0.00,     0.01,     0.36,     0.29,     0.44,     0.00,
             0.10,     0.10,    98.70]], grad_fn=<MulBackward0>)

Next I wanted to experiment with putting in some garbage data and seeing what I get out:

garbage

tensor([[ 3.99, 15.13,  1.83, 35.38,  1.20, 40.62,  0.12,  0.43,  0.17,  1.14]],
       grad_fn=<MulBackward0>) #prince
tensor([[    0.00,     0.00,     0.00,    92.05,     0.00,     3.18,     0.00,
             4.76,     0.00,     0.01]], grad_fn=<MulBackward0>) #35
tensor([[ 1.74,  0.87,  5.31,  2.10,  0.68,  0.24,  1.08,  0.52, 86.35,  1.11]],
       grad_fn=<MulBackward0>) #noise1
tensor([[ 0.30,  0.29,  2.11, 12.38,  2.04,  1.31,  1.65,  0.39, 59.00, 20.53]],
       grad_fn=<MulBackward0>) #noise2
tensor([[10.68, 10.36,  8.21,  7.92,  9.92, 11.62, 10.18,  9.50, 10.65, 10.97]],
       grad_fn=<MulBackward0>) #noise3

I wanted to double check my data so I rerun the NN on noise1.png, and I get random values each time. I don't understand why that is. Although they do lean towards a similar classification every time.

(env-pytorch) [river@river mnist]$ python try.py --input my-ex-3/noise1.png --no-cuda --no-mps
tensor([[    1.70,     0.06,     1.87,     5.79,     2.25,     6.93,     0.38,
             1.10,    67.72,    12.21]], grad_fn=<MulBackward0>)
(env-pytorch) [river@river mnist]$ python try.py --input my-ex-3/noise1.png --no-cuda --no-mps
tensor([[    0.58,     0.02,     0.79,     5.73,     0.64,     0.17,     1.34,
             0.01,    90.01,     0.73]], grad_fn=<MulBackward0>)
(env-pytorch) [river@river mnist]$ python try.py --input my-ex-3/noise1.png --no-cuda --no-mps
tensor([[ 0.58,  0.99,  8.43,  4.11,  8.19,  3.20,  0.56,  4.44, 64.37,  5.12]],
       grad_fn=<MulBackward0>)
(env-pytorch) [river@river mnist]$ python try.py --input my-ex-3/noise1.png --no-cuda --no-mps
tensor([[    0.16,     0.00,     0.05,     0.33,     1.79,     0.45,     0.06,
             0.04,    70.44,    26.67]], grad_fn=<MulBackward0>)

Conclusions

I think I was clouded by skepticism a bit. I didn't believe that this neural network would work on my own data, so when it didn't it took me a while to before it hit me that it was because of my own mistake. I'm pretty impressed by the capability of this neural network to correctly classify digits, which takes 3 mins to train to 99% accuracy.

On the other hand there is a problem with using neural networks like this. It just has 10 outputs. There isn't really a way for it to express uncertaintly or say "I don't know" which is something that I feel could be important in some contexts. I was hoping it would give a 0.1 value for every output neuron for random noise, but it confidently misclassifies it. You could probably add a new "I don't know" neuron and train it to recognize noise on that, but it wouldn't activate properly for non-noise garbage inputs. The NN has 6 layers, data that goes in wil be pushed through to the very end, there is fundamentally no way for it to reject nonsense inputs.

I've heard about an interesting experiment that was done with this data set from here. One is to just give every image a completely random label (So there will be pictures of 3s which are labelled as a 7, and other pictures of 3s which are labelled as a 4). Then train it. This forces the neural network to 'memorize' the input data set instead of learning to 'actually recognize numbers'. Something that requires a large enough neural network to do - as per the universal approximation theorem. The 'loss' (accuracy of the model) over time decreases much slow than correctly labelled data. This implies that the NN is somehow working more efficiently on correctly labelled data.

Some things to explore in future:

  • Include letters as well. How would you recognize characters as part of a larger image, instead of the image being just a digit.
  • Can we produce pictures of numbers out of nothing by running the neural network backwards?
  • Can we produce heatmaps of parts of the input images that are more or less releveant to e.g. being an 8.
  • Can we 'interpret' what particular groups of neurons do.
  • If we train the NN from a different random seed, will its outputs on the random noises images change?
  • Can we train a NN to classify whether an output classification vector was produced from the training data or the test data set.
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR
from torchvision.io import read_image
from torchvision.io import ImageReadMode
import torchvision.transforms.functional as TF
from PIL import Image, ImageOps
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
def main():
# Training settings
parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
parser.add_argument('--no-cuda', action='store_true', default=False,
help='disables CUDA training')
parser.add_argument('--no-mps', action='store_true', default=False,
help='disables macOS GPU training')
parser.add_argument('--load-model', action='store_true', default='mnist_cnn.pt',
help='For Loading the current Model')
parser.add_argument('--input', required=True, action='store', default=False,
help='32x32 png')
args = parser.parse_args()
use_cuda = not args.no_cuda and torch.cuda.is_available()
use_mps = not args.no_mps and torch.backends.mps.is_available()
#torch.manual_seed(args.seed)
if use_cuda:
device = torch.device("cuda")
elif use_mps:
device = torch.device("mps")
else:
device = torch.device("cpu")
if use_cuda:
cuda_kwargs = {'num_workers': 1,
'pin_memory': True,
'shuffle': True}
#train_kwargs.update(cuda_kwargs)
#test_kwargs.update(cuda_kwargs)
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
model = Net().to(device)
if args.load_model:
res = torch.load("mnist_cnn.pt")
model.load_state_dict(res)
else:
return
#img = read_image(args.input, ImageReadMode.GRAY)
#img = TF.convert_image_dtype(img, dtype=torch.float)
img = Image.open(args.input)
img = ImageOps.fit(img, (28, 28))
img = ImageOps.grayscale(img)
img = transform(img)
img.to(device)
torch.set_printoptions(precision=2, sci_mode=False)
print(torch.exp(model.forward(img.unsqueeze(1))) * 100)
if __name__ == '__main__':
main()
from PIL import Image, ImageDraw
def create_image_with_dots(input_image_path, brightness_array, output_image_path):
# Load the input image
img = Image.open(input_image_path)
width, height = img.size
# Create a new image with extra space for the dots
new_height = height + 16
new_img = Image.new('RGB', (width, new_height), color=(0, 0, 0))
new_img.paste(img, (0, 0))
# Draw the dots with brightness controlled by the brightness_array
draw = ImageDraw.Draw(new_img)
for i, brightness in enumerate(brightness_array):
x = int(width / 6 * (i % 5 + 1))
y = height + i // 5 * 10
radius = 2
color = int(255 * brightness/100.0)
draw.ellipse((x - radius, y - radius, x + radius, y + radius), fill=(255, 255-color, 255))
# Save the new image
new_img.save(output_image_path)
# Example usage
input_image_path = 'my-ex-3/noise3.png'
brightness_array = [10.68, 10.36, 8.21, 7.92, 9.92, 11.62, 10.18, 9.50, 10.65, 10.97]
output_image_path = input_image_path + ".out.png"
create_image_with_dots(input_image_path, brightness_array, output_image_path)

I recieved the following help from GPT-4 when creating this post:

in pytorch How do I make an tensor from shape [1, 28, 28] into tensor shape [1, 1, 28, 28]?

In PyTorch, you can use the unsqueeze() function to add an extra dimension to a tensor. To change a tensor from shape [1, 28, 28] to [1, 1, 28, 28], you can unsqueeze at dimension 1:

import torch

tensor = torch.randn(1, 28, 28)  # Example tensor with shape [1, 28, 28]
new_tensor = tensor.unsqueeze(1)  # Add an extra dimension at position 1

print(new_tensor.shape)  # Output: torch.Size([1, 1, 28, 28])

I get outputs like tensor([[ -5.3909, -27.7679, -0.2184, -1.7517, -28.3149, -4.0096, -19.7950, -16.8305, -15.4390, -23.0980]], grad_fn=)

I expected numbers between 0 and 1, how can I transform them?

Your output tensor seems to be the result of a LogSoftmax operation. To convert the log probabilities back to probabilities between 0 and 1, you can use the torch.exp() function:

import torch

_output = torch.tensor([[-5.3909, -27.7679, -0.2184, -1.7517, -28.3149, -4.0096, -19.7950, -16.8305, -15.4390, -23.0980]])
prob_output = torch.exp(log_output)

print(prob_output)

This will give you a tensor with values between 0 and 1, representing probabilities.

Write a python script to create an image from an image and an array. It adds 10 dots below an image. The brightness of each dot is controlled by an array of floats.

Here's a Python script that uses the Pillow library to create an image with 10 dots below an input image. The brightness of each dot is controlled by an array of floats:

from PIL import Image, ImageDraw

def create_image_with_dots(input_image_path, brightness_array, output_image_path):
    # Load the input image
    img = Image.open(input_image_path)
    width, = img.size

    # Create a new image with extra space for the dots
    new_height = height + 20
    new_img = Image.new('RGB', (, new_height), color=(255, 255, 255))
    new_img.paste(img, (0, 0))

    # Draw the dots with brightness controlled by the brightness_array
    draw = ImageDraw.Draw(new_img)
    for i, brightness in enumerate(brightness_array):
        x = int(width / 11 * (i + 1))
        y =height + 10
        radius = 5
        color = int(255 * brightness)
        draw.ellipse((x - radius, y - radius, x + radius, y + radius), fill=(color, color, color))

    # Save the new image
    new_img.save(output_image_path)

# Example usage
input_image_path = 'path/to/your/input/image.jpg'
brightness_array = [0.1, 0.3, 0.5, 0.7, 0.9, 0.8, 0.6, 0.4, 0.2, 1.0]
output_image_path = 'path/to/your/output/image.jpg'

create_image_with_dots(input_image_path, brightness_array, output_image_path)

This script first loads the input image and creates a new image with extra space for the dots. It then draws the dots below the input image with their brightness controlled by the

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment