This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(llama-cuda12) bash-5.1$ torchrun --nnodes 1 --nproc_per_node 4 recipes/finetuning/finetuning.py --use_peft --peft_method lora --model_name meta-llama/Llama-2-7b-chat-hf --enable_fsdp --use_fast_kernels --pure_bf16 --dist_checkpoint_root_folder ./Llama-2-70b-chat-hf/ --dist_checkpoint_folder fine-tuned | |
W0408 09:24:54.152000 140136569627136 torch/distributed/run.py:757] | |
W0408 09:24:54.152000 140136569627136 torch/distributed/run.py:757] ***************************************** | |
W0408 09:24:54.152000 140136569627136 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
W0408 09:24:54.152000 140136569627136 torch/distributed/run.py:757] ***************************************** | |
Warning: unknown parameter pure_bf16 | |
Warning: unknown parameter pure_bf16 | |
Warning: unknown parameter pure_bf16 | |
Warning: unknown parameter pure_bf16 | |
C |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Traceback (most recent call last): | |
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main | |
return _run_code(code, main_globals, None, | |
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code | |
exec(code, run_globals) | |
File "/TensorRT-LLM/examples/server/server.py", line 5, in <module> | |
import tensorrt_llm | |
File "/TensorRT-LLM/tensorrt_llm/__init__.py", line 47, in <module> | |
from .hlapi.llm import LLM, ModelConfig | |
File "/TensorRT-LLM/tensorrt_llm/hlapi/__init__.py", line 1, in <module> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
normalizer_spec { | |
name: "identity" | |
precompiled_charsmap: "" | |
add_dummy_prefix: true | |
remove_extra_whitespaces: false | |
normalization_rule_tsv: "" | |
} | |
trainer_spec { | |
input: "/large_experiments/theorem/datasets/MERGED/all.test1.merged" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from transformers import AutoModelForCausalLM, AutoTokenizer, StaticCache | |
import torch | |
from typing import Optional | |
import time | |
device = "cuda" | |
torch.set_float32_matmul_precision('high') | |
# Copied from the gpt-fast repo | |
def multinomial_sample_one_no_sync(probs_sort): # Does multinomial sampling without a cuda synchronization | |
q = torch.empty_like(probs_sort).exponential_(1) | |
return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int) |