Created
March 6, 2023 05:17
-
-
Save shawwn/65a2ae80c8c0369a8cb8ff24d88d4f80 to your computer and use it in GitHub Desktop.
How I run 65B using my fork of llama at https://github.com/shawwn/llama
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mp=1; size=7B; # to run 7B | |
mp=8; size=65B; # to run 65B | |
for seed in $(randint 1000000) | |
do | |
export TARGET_FOLDER=~/ml/data/llama/LLaMA | |
time python3 -m torch.distributed.run --nproc_per_node $mp example.py --ckpt_dir $TARGET_FOLDER/$size --tokenizer_path $TARGET_FOLDER/tokenizer.model --seed $seed --max_seq_len 2048 --max_gen_len 2048 --count 0 | tee -a ${size}_startrek.txt | |
done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, Shawwn
I'm running your inference code(https://github.com/shawwn/llama)
When I running mp=1 size=7B,it works well, But when I change mp=2 size=13B, the model loaded correctly into 2 GPUs, but when running the first sample, there is an error:
Traceback (most recent call last): File "example.py", line 165, in <module> fire.Fire(main) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 480, in _Fire target=component.__name__) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "example.py", line 160, in main [prompt], max_gen_len=max_gen_len, temperature=temperature, top_p=top_p, top_k=top_k, repetition_penalty=repetition_penalty, token_callback=callback, File "D:\LLaMA\llama\llama\generation.py", line 46, in generate logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "D:\LLaMA\llama\llama\model.py", line 225, in forward h = self.tok_embeddings(tokens) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\layers.py", line 214, in forward output = gather_from_model_parallel_region(output_parallel) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 156, in gather_from_model_parallel_region return _GatherFromModelParallelRegion.apply(input_) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 131, in forward return _gather(input_) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 82, in _gather torch.distributed.all_gather(tensor_list, input_, group=group) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\distributed\distributed_c10d.py", line 2282, in all_gather work.wait() RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See https://github.com/pytorch/rfcs/pull/17 for more details.
I found the same error in stackoverflow:
https://stackoverflow.com/questions/71223747/pytorch-error-when-modifying-unpacked-tensor
I changed
logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
into
temp = tokens[:, prev_pos:cur_pos].clone()
logits = self.model.forward(temp, prev_pos)
But the problem remains, I'm new to python, could you have a look at it?
environment:
intel i7 8700k
32gb of ram
2 tesla P40 GPUs(24GB video memory each)
win11
python: 3.7
torch version: 1.13.1+cu117
cuda version: 11.7