Skip to content

Instantly share code, notes, and snippets.

View Rudrabha's full-sized avatar
💭
Trying to complete a PhD :)

Rudrabha Mukhopadhyay Rudrabha

💭
Trying to complete a PhD :)
View GitHub Profile
@pemagrg1
pemagrg1 / convert_envyml_to_reqtxt
Created April 22, 2020 17:51
convert environment.yml to requirement.txt
import ruamel.yaml
yaml = ruamel.yaml.YAML()
data = yaml.load(open('environment.yml'))
requirements = []
for dep in data['dependencies']:
if isinstance(dep, str):
package, package_version, python_version = dep.split('=')
if python_version == '0':
@TengdaHan
TengdaHan / ddp_notes.md
Last active July 2, 2024 06:39
Multi-node-training on slurm with PyTorch

Multi-node-training on slurm with PyTorch

What's this?

  • A simple note for how to start multi-node-training on slurm scheduler with PyTorch.
  • Useful especially when scheduler is too busy that you cannot get multiple GPUs allocated, or you need more than 4 GPUs for a single job.
  • Requirement: Have to use PyTorch DistributedDataParallel(DDP) for this purpose.
  • Warning: might need to re-factor your own code.
  • Warning: might be secretly condemned by your colleagues because using too many GPUs.
@sgraaf
sgraaf / ddp_example.py
Last active June 7, 2024 16:26
PyTorch Distributed Data Parallel (DDP) example
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from argparse import ArgumentParser
import torch
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.utils.data import DataLoader, Dataset
from torch.utils.data.distributed import DistributedSampler
from transformers import BertForMaskedLM