Skip to content

Instantly share code, notes, and snippets.

@y0ast
Last active May 30, 2021 21:01
Show Gist options
  • Star 28 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save y0ast/f69966e308e549f013a92dc66debeeb4 to your computer and use it in GitHub Desktop.
Save y0ast/f69966e308e549f013a92dc66debeeb4 to your computer and use it in GitHub Desktop.
@AlexPasqua
Copy link

AlexPasqua commented May 21, 2021

That's brilliant, thanks! 😃

Anyway, I've noticed that it is possible to completely bypass the DataLoaders and work directly with tensors. This will increase even more the performance.
For example, if the batch size is equal to the whole training set (i.e. batch training), the GPU usage is near 100% (NVIDIA GeForce MX150). Note that this will decrease as the batch size decreases.
Regarding the execution time: with a batch size of 64 and 100 epochs, the whole execution time went from around 238s to around 181s.
(This is just an indication, I haven't performed a complete and rigorous test).

To use directly the tensors:

from torch.utils.data import DataLoader

train_dataset = FastMNIST('data/MNIST', train=True, download=True)
test_dataset = FastMNIST('data/MNIST', train=False, download=True)
train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=0)
test_dataloader = DataLoader(test_dataset, batch_size=10000, shuffle=False, num_workers=0)
n_batches = math.ceil(len(train_dataset.data) / batch_size)
for batch_idx in range(n_batches):
    # select a slice from your training set as a minibatch
    train_batch = train_dataset.data[batch_idx * batch_size: batch_idx * batch_size + batch_size]

    # perform computations
    outputs = model(train_batch)
    ...

Furthermore, I think it's worth noticing that the normalization you perform (after the scaling) it's not always good, but this may depend on the task. For example, I'm working with autoencoders and if I don't comment that line I get bad results (in terms of reconstruction error).


Thanks again for your gist, I hope this can help improving even more the performance for small datasets like MNIST! 😃

@y0ast
Copy link
Author

y0ast commented May 22, 2021

Unfortunately that does not give the correct behavior: you're not randomizing your batches at each epoch which leads to significant reduced performance.

Yes this normalization is 0 mean, 1 std, for a VAE + MNIST you generally model your data as a multivariate bernoulli, which requires it to be between 0 and 1.

@AlexPasqua
Copy link

Unfortunately that does not give the correct behavior: you're not randomizing your batches at each epoch which leads to significant reduced performance.

That's true, the shuffling should then be done manually. I think this should work:

train_dataset.data = train_dataset.data[torch.randperm(train_dataset.data.shape[0])]

(assuming the first dimension of train_dataset.data to be the batch size)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment