Skip to content

Instantly share code, notes, and snippets.

@R11moore
R11moore / README.md
Last active November 11, 2025 18:21
train_template.py v2.19 — Atomic Cache: Fault-Tolerant, Fast-Restore WebDataset Trainer

v2.19 – Atomic Cache: Resilient WebDataset Training for Colab/GCP

Trains any model on Colab A100 with atomic, streaming GCS cache backups.


Features

  • Streaming tar | gsutil cp cache backup every 1000 batches
  • Full resume in < 5 minutes after disconnect