Skip to content

Instantly share code, notes, and snippets.

View Rexhaif's full-sized avatar
:octocat:
кыськысь

Daniil Larionov Rexhaif

:octocat:
кыськысь
  • @NL2G Universität Mannheim
  • Germany
  • 21:18 (UTC +02:00)
View GitHub Profile
@TengdaHan
TengdaHan / ddp_notes.md
Last active July 2, 2024 06:39
Multi-node-training on slurm with PyTorch

Multi-node-training on slurm with PyTorch

What's this?

  • A simple note for how to start multi-node-training on slurm scheduler with PyTorch.
  • Useful especially when scheduler is too busy that you cannot get multiple GPUs allocated, or you need more than 4 GPUs for a single job.
  • Requirement: Have to use PyTorch DistributedDataParallel(DDP) for this purpose.
  • Warning: might need to re-factor your own code.
  • Warning: might be secretly condemned by your colleagues because using too many GPUs.
@lantiga
lantiga / export_trace.py
Last active September 18, 2022 03:03
🤗 Huggingface Bert on RedisAI
from transformers import BertForQuestionAnswering
import torch
bert_name = "bert-large-uncased-whole-word-masking-finetuned-squad"
model = BertForQuestionAnswering.from_pretrained(bert_name, torchscript=True)
model.eval()
inputs = [torch.ones(1, 2, dtype=torch.int64),
torch.ones(1, 2, dtype=torch.int64),
@mohanpedala
mohanpedala / bash_strict_mode.md
Last active July 4, 2024 12:40
set -e, -u, -o, -x pipefail explanation