|import numpy as np|
|Loosely inspired by http://abel.ee.ucla.edu/cvxopt/_downloads/mnist.py|
|which is GPL licensed.|
|def read(dataset = "training", path = "."):|
|Python function for importing the MNIST data set. It returns an iterator|
|of 2-tuples with the first element being the label and the second element|
|being a numpy.uint8 2D array of pixel data for the given image.|
|if dataset is "training":|
|fname_img = os.path.join(path, 'train-images-idx3-ubyte')|
|fname_lbl = os.path.join(path, 'train-labels-idx1-ubyte')|
|elif dataset is "testing":|
|fname_img = os.path.join(path, 't10k-images-idx3-ubyte')|
|fname_lbl = os.path.join(path, 't10k-labels-idx1-ubyte')|
|raise ValueError, "dataset must be 'testing' or 'training'"|
|# Load everything in some numpy arrays|
|with open(fname_lbl, 'rb') as flbl:|
|magic, num = struct.unpack(">II", flbl.read(8))|
|lbl = np.fromfile(flbl, dtype=np.int8)|
|with open(fname_img, 'rb') as fimg:|
|magic, num, rows, cols = struct.unpack(">IIII", fimg.read(16))|
|img = np.fromfile(fimg, dtype=np.uint8).reshape(len(lbl), rows, cols)|
|get_img = lambda idx: (lbl[idx], img[idx])|
|# Create an iterator which returns each image in turn|
|for i in xrange(len(lbl)):|
|Render a given numpy.uint8 2D array of pixel data.|
|from matplotlib import pyplot|
|import matplotlib as mpl|
|fig = pyplot.figure()|
|ax = fig.add_subplot(1,1,1)|
|imgplot = ax.imshow(image, cmap=mpl.cm.Greys)|
@BigHopes, after putting the unzipped files into ./mnist below my notebook this worked for me in Jupyter:
Also, to get it to work with Python 3, three changes were necessary. Add braces to line 24, xrange to range, and maybe one more thing that I now can't remember.
In order to get the show function to work you need to pass the second element of the tuple.
If you are curious (like me) about how the numbers in the matrix make up the image, the function below will show that.
Does anyone know if these functions are compatible with the EMNIST dataset? https://www.nist.gov/itl/iad/image-group/emnist-dataset
I would like to use it to parse the files but the read function keeps giving me this error:
getting error while executing the above code:
IOErrorTraceback (most recent call last)
in read(dataset, path)
IOError: [Errno 13] Permission denied: '.\train-labels-idx1-ubyte'
@Jae1015 Note that you should extract the image and label files before reading them. After extraction you should get two data files of images and labels of sizes around 47.0 MB and 60.0 kB respectively. It seems that you must have done this "Simply rename them to remove the .gz extension" but this only applies when the web browser automatically uncompress the downloaded files.