Skip to content

Instantly share code, notes, and snippets.

Installing PyTorch 2.7.0 on Aurora with uv

#[🐍 aurora_nre_models_frameworks-2024.2.1_u1]
#[01:06:45 PM][x4001c3s0b0n0][/flare/datascience/foremans/projects/saforem2/tmp/2025-04-24-130620]
; uv venv --python=$(which python3) --system-site-packages # "venvs/$(echo "${CONDA_PREFIX}" | tr "\/" " " | awk '{print $NF}')"
Using CPython 3.10.14 interpreter at: /opt/aurora/24.180.3/frameworks/aurora_nre_models_frameworks-2024.2.1_u1/bin/python3

Megatron-DeepSpeed

I've included the full series of commands (and their outputs) from a fresh attempt again this morning (2025-03-12) incase its helpful:

#[08:54:16 AM][x4716c2s4b0n0][/f/d/f/p/a/Megatron-DeepSpeed][🌱 main][βœ“]
$ source <(curl 'https://raw.githubusercontent.com/saforem2/ezpz/refs/heads/main/src/ezpz/bin/utils.sh') && ezpz_setup_env
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
@saforem2
saforem2 / ezpz.sh
Created February 3, 2025 20:27
ezpz
#!/bin/bash --login
# @file utils.sh
# @brief `ezpz` helper script with functions to make life ez.
# @description
# This file provides multiple helper functions, all prefixed with "ezpz_"
# - `ezpz_setup_job`
# - `ezpz_setup_python`
# - ...
#
@saforem2
saforem2 / torchtune-patch-aurora.md
Created January 31, 2025 22:54
Torchtune fix on Aurora

Patch to get torchtune working on Aurora

diff --git a/torchtune/training/_distributed.py b/torchtune/training/_distributed.py
index ff959c5f..c3966290 100644
--- a/torchtune/training/_distributed.py
+++ b/torchtune/training/_distributed.py
@@ -14,7 +14,11 @@ import torch
 import torch.distributed as dist
 from torch import nn

Torchtune on Aurora

Sam Foreman
2025-01-26

Patch on Aurora

diff --git a/torchtune/training/_distributed.py b/torchtune/training/_distributed.py

Parallel Training Methods

Sam Foreman
2024-11-05

πŸ“‘ Outline

Table of Contents
@saforem2
saforem2 / convert_archive.py
Created November 18, 2024 14:01 — forked from deepfates/convert_archive.py
Convert your twitter archive into a training dataset and markdown files
import argparse
import json
import logging
import os
import re
import shutil
from concurrent.futures import ProcessPoolExecutor, as_completed
from dataclasses import dataclass
from datetime import datetime
from typing import Any, Callable, Dict, List, Literal, Optional, Tuple

🐍 Setup @ ALCF

Easiest way to get setup on any of {Polaris, Aurora, Sunspot}[^anywhere] is to use πŸ‹ ezpz

# ezpz
git clone https://github.com/saforem2/ezpz deps/ezpz

# ezpz: setup