This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(pytorch) [shunting@devgpu002.lla3 ~/ws/pytorch (loaf)]$ python ~/t.py | |
/home/shunting/ws/miniconda3/envs/pytorch/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. | |
warnings.warn( | |
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 | |
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
Compiled module path: /tmp/torchinductor_shunting/tmpav4o5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# AOT ID: ['1_inference'] | |
from ctypes import c_void_p, c_long | |
import torch | |
import math | |
import random | |
import os | |
import tempfile | |
from math import inf, nan | |
from torch._inductor.hooks import run_intermediate_hooks |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0.016571 | |
0.022813 | |
Profiling result for a compiled module of benchmark pnasnet5large: | |
Chrome trace for the profile is written to /tmp/compiled_module_profile.json | |
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ -------------------------------------------------------------------------------- | |
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls Input Shapes | |
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ -------------------------------------------------------------------------------- | |
sm90_xmma_gemm_bf16bf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0.012985 | |
0.020379 | |
Profiling result for a compiled module of benchmark pnasnet5large: | |
Chrome trace for the profile is written to /tmp/compiled_module_profile.json | |
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ -------------------------------------------------------------------------------- | |
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls Input Shapes | |
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ -------------------------------------------------------------------------------- | |
sm90_xmma_gemm_bf16bf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# AOT ID: ['0_backward'] | |
from ctypes import c_void_p, c_long | |
import torch | |
import math | |
import random | |
import os | |
import tempfile | |
from math import inf, nan | |
from torch._inductor.hooks import run_intermediate_hooks |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(pytorch) [shunting@devgpu002.lla3 ~/ws/pytorch (loaf)]$ python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --export-aot-inductor --device cuda --inference --bfloat16 --only sam_fast | |
loading model: 0it [00:00, ?it/s]INFO:root:running build_ext | |
INFO:root:running build_ext | |
INFO:root:running build_ext | |
INFO:root:running build_ext | |
INFO:root:running build_ext | |
INFO:root:running build_ext | |
INFO:root:running build_ext | |
INFO:root:running build_ext | |
INFO:root:running build_ext |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Thread 227 "pt_autograd_0" received signal SIGSEGV, Segmentation fault. | |
[Switching to LWP 1007598] | |
0x00007ffff7c8cee4 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6 | |
(gdb) bt | |
#0 0x00007ffff7c8cee4 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6 | |
#1 0x00007ffe1029189c in torch::autograd::ForwardGrad::clear() () from /home/shunting/ws/vision/torchvision/_C.so | |
#2 0x00007ffe102a6565 in torch::autograd::CppNode<vision::ops::(anonymous namespace)::ROIAlignFunction>::release_variables() () | |
from /home/shunting/ws/vision/torchvision/_C.so | |
#3 0x00007fffedf68dd2 in torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffe |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(pytorch) [shunting@devgpu005.nha1 ~/ws/pytorch (acc)]$ time python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --disable-cudagr | |
aphs --device cuda --only vision_maskrcnn | |
loading model: 0it [00:05, ?it/s] | |
cuda train vision_maskrcnn | |
Traceback (most recent call last): | |
File "/home/shunting/ws/pytorch/benchmarks/dynamo/common.py", line 2335, in validate_model | |
self.model_iter_fn(model, example_inputs) | |
File "/home/shunting/ws/pytorch/benchmarks/dynamo/torchbench.py", line 466, in forward_and_backward_pass | |
pred = mod(*cloned_inputs) | |
File "/home/shunting/ws/pytorch/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2024-07-04T23:20:39.8000729Z loading model: 0it [00:00, ?it/s]WARNING:common:Model pyhpc_turbulent_kinetic_energy does not support bfloat16, running with amp instead | |
2024-07-04T23:20:39.9484088Z | |
2024-07-04T23:20:39.9485124Z loading model: 0it [00:01, ?it/s] | |
2024-07-04T23:20:39.9486440Z WARNING:common:Model pyhpc_turbulent_kinetic_energy does not support bfloat16, running with amp instead | |
2024-07-04T23:20:39.9487446Z cuda eval pyhpc_turbulent_kinetic_energy | |
2024-07-04T23:20:39.9743673Z WARNING:common:Model pyhpc_turbulent_kinetic_energy does not support bfloat16, running with amp instead | |
2024-07-04T23:21:01.2023259Z ERROR:common: | |
2024-07-04T23:21:01.2024509Z Traceback (most recent call last): | |
2024-07-04T23:21:01.2025803Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2642, in check_accuracy | |
2024-07-04T23:21:01.2027561Z new_result = optimized_model_iter_fn(model_copy, example_inputs) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(pytorch) [shunting@devgpu005.nha1 ~/ws/pytorch (acc-sebotnet33ts_256)]$ TORCHINDUCTOR_MAX_AUTOTUNE=1 time python benchmarks/dynamo/huggingface.py --accuracy --no-translation-validation --training --amp --backend inductor --device cuda --only LayoutLMForMaskedLM | |
loading model: 0it [00:06, ?it/s] | |
cuda train LayoutLMForMaskedLM | |
AUTOTUNE addmm(512x3072, 512x768, 768x3072) | |
triton_mm_130 0.0209 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=True, GROUP_M=8, num_stages=3, num_warps=4 | |
triton_mm_131 0.0217 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=True, GROUP_M=8, num_stages=5, num_warps=8 | |
triton_mm_124 0.0227 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=True, GROUP_M=8, num_stages=3, num_warps=4 | |
triton_mm_129 0.0240 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLO |
NewerOlder