Skip to content

Instantly share code, notes, and snippets.

View yundai424's full-sized avatar

Yun Dai yundai424

View GitHub Profile
======================================================================= FAILURES ========================================================================
____________________________________ test_correctness_functional[30.0-1.0-0.5--100-0.5-dtype1-1e-05-0.0005-2-2-8-8] _____________________________________
B = 2, T = 2, H = 8, V = 8, scalar = 0.5, dtype = torch.float32, beta = 0.5, ignore_index = -100, temperature = 1.0, softcap = 30.0, atol = 1e-05
rtol = 0.0005
@pytest.mark.parametrize(
"B, T, H, V",
[
(2, 2, 8, 8),
@yundai424
yundai424 / requirements.txt
Last active May 19, 2024 07:19
Minimal example to reproduce NaN loss issue. Verified on 1 node with 4 A100 GPUs
torch==2.1
transformers==4.37.2
# the fix is shipped with deepspeed==0.13.5
deepspeed==0.13.4
tokenizers==0.15.1