Skip to content

Instantly share code, notes, and snippets.

View tqxli's full-sized avatar

Tianqing Li tqxli

View GitHub Profile
@TengdaHan
TengdaHan / ddp_notes.md
Last active June 7, 2025 22:26
Multi-node-training on slurm with PyTorch

Multi-node-training on slurm with PyTorch

What's this?

  • A simple note for how to start multi-node-training on slurm scheduler with PyTorch.
  • Useful especially when scheduler is too busy that you cannot get multiple GPUs allocated, or you need more than 4 GPUs for a single job.
  • Requirement: Have to use PyTorch DistributedDataParallel(DDP) for this purpose.
  • Warning: might need to re-factor your own code.
  • Warning: might be secretly condemned by your colleagues because using too many GPUs.
@madelinegannon
madelinegannon / setup-azure-kinect-on-jetson-x-nx.md
Last active September 28, 2025 16:24
Notes on Setting up the Microsoft Azure Kinect on Ubuntu 18.04

tmux cheatsheet

As configured in my dotfiles.

start new:

tmux

start new with session name: