Skip to content

Instantly share code, notes, and snippets.

@dasayan05
Created February 27, 2019 10:16
Show Gist options
  • Save dasayan05/6f7e5c17a872f54ce4341f5646562061 to your computer and use it in GitHub Desktop.
Save dasayan05/6f7e5c17a872f54ce4341f5646562061 to your computer and use it in GitHub Desktop.
Distributed training of DL model
model = LeNet()
# first synchronization of initial weights
sync_initial_weights(model, rank, world_size)
optimizer = optim.SGD(model.parameters(), lr=1e-3, momentum=0.85)
model.train()
for epoch in range(1, epochs + 1):
for data, target in train_loader:
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
# The all-reduce on gradients
sync_gradients(model, rank, world_size)
optimizer.step()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment