Skip to content

Instantly share code, notes, and snippets.

@tylerneylon
Last active April 30, 2022 06:48
Show Gist options
  • Save tylerneylon/ce60e8a06e7506ac45788443f7269e40 to your computer and use it in GitHub Desktop.
Save tylerneylon/ce60e8a06e7506ac45788443f7269e40 to your computer and use it in GitHub Desktop.
A function to load numpy arrays from the MNIST data files.
""" A function that can read MNIST's idx file format into numpy arrays.
The MNIST data files can be downloaded from here:
http://yann.lecun.com/exdb/mnist/
This relies on the fact that the MNIST dataset consistently uses
unsigned char types with their data segments.
"""
import struct
import numpy as np
def read_idx(filename):
with open(filename, 'rb') as f:
zero, data_type, dims = struct.unpack('>HBB', f.read(4))
shape = tuple(struct.unpack('>I', f.read(4))[0] for d in range(dims))
return np.fromstring(f.read(), dtype=np.uint8).reshape(shape)
@JonnoFTW
Copy link

Python can automatically handle gzip files, just add:

import gzip

Then change

with open(filename, 'rb') as f:

to:

with gzip.open(filename) as f:

@alanhyue
Copy link

Thanks a lot dude! It is interesting that the default encoding was high-endian for NON-intel processors... since in my mind most people ARE using Intel processors... Anyway, thanks for the gist!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment