y0ast/Faster MNIST.md

Last active May 30, 2021 21:01

Star 28 You must be signed in to star a gist
Fork 1 You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/y0ast/f69966e308e549f013a92dc66debeeb4.js"></script>
Save y0ast/f69966e308e549f013a92dc66debeeb4 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

Faster MNIST.md

Now available here: https://github.com/y0ast/pytorch-snippets/tree/main/fast_mnist

AlexPasqua commented May 21, 2021 •

edited

That's brilliant, thanks! 😃

Anyway, I've noticed that it is possible to completely bypass the DataLoaders and work directly with tensors. This will increase even more the performance.
For example, if the batch size is equal to the whole training set (i.e. batch training), the GPU usage is near 100% (NVIDIA GeForce MX150). Note that this will decrease as the batch size decreases.
Regarding the execution time: with a batch size of 64 and 100 epochs, the whole execution time went from around 238s to around 181s.
(This is just an indication, I haven't performed a complete and rigorous test).

To use directly the tensors:

from torch.utils.data import DataLoader

train_dataset = FastMNIST('data/MNIST', train=True, download=True)
test_dataset = FastMNIST('data/MNIST', train=False, download=True)

train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=0)
test_dataloader = DataLoader(test_dataset, batch_size=10000, shuffle=False, num_workers=0)

n_batches = math.ceil(len(train_dataset.data) / batch_size)
for batch_idx in range(n_batches):
    # select a slice from your training set as a minibatch
    train_batch = train_dataset.data[batch_idx * batch_size: batch_idx * batch_size + batch_size]

    # perform computations
    outputs = model(train_batch)
    ...

Furthermore, I think it's worth noticing that the normalization you perform (after the scaling) it's not always good, but this may depend on the task. For example, I'm working with autoencoders and if I don't comment that line I get bad results (in terms of reconstruction error).

Thanks again for your gist, I hope this can help improving even more the performance for small datasets like MNIST! 😃

Author

y0ast commented May 22, 2021 •

edited

Unfortunately that does not give the correct behavior: you're not randomizing your batches at each epoch which leads to significant reduced performance.

Yes this normalization is 0 mean, 1 std, for a VAE + MNIST you generally model your data as a multivariate bernoulli, which requires it to be between 0 and 1.

AlexPasqua commented May 22, 2021

Unfortunately that does not give the correct behavior: you're not randomizing your batches at each epoch which leads to significant reduced performance.

That's true, the shuffling should then be done manually. I think this should work:

train_dataset.data = train_dataset.data[torch.randperm(train_dataset.data.shape[0])]

(assuming the first dimension of train_dataset.data to be the batch size)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment