Skip to content

Instantly share code, notes, and snippets.

@shawwn
Created March 6, 2023 05:17
Show Gist options
  • Save shawwn/65a2ae80c8c0369a8cb8ff24d88d4f80 to your computer and use it in GitHub Desktop.
Save shawwn/65a2ae80c8c0369a8cb8ff24d88d4f80 to your computer and use it in GitHub Desktop.
How I run 65B using my fork of llama at https://github.com/shawwn/llama
mp=1; size=7B; # to run 7B
mp=8; size=65B; # to run 65B
for seed in $(randint 1000000)
do
export TARGET_FOLDER=~/ml/data/llama/LLaMA
time python3 -m torch.distributed.run --nproc_per_node $mp example.py --ckpt_dir $TARGET_FOLDER/$size --tokenizer_path $TARGET_FOLDER/tokenizer.model --seed $seed --max_seq_len 2048 --max_gen_len 2048 --count 0 | tee -a ${size}_startrek.txt
done
@Pliman
Copy link

Pliman commented Mar 11, 2023

Hi, Shawwn

I'm running your inference code(https://github.com/shawwn/llama)
When I running mp=1 size=7B,it works well, But when I change mp=2 size=13B, the model loaded correctly into 2 GPUs, but when running the first sample, there is an error:
Traceback (most recent call last): File "example.py", line 165, in <module> fire.Fire(main) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 480, in _Fire target=component.__name__) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "example.py", line 160, in main [prompt], max_gen_len=max_gen_len, temperature=temperature, top_p=top_p, top_k=top_k, repetition_penalty=repetition_penalty, token_callback=callback, File "D:\LLaMA\llama\llama\generation.py", line 46, in generate logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "D:\LLaMA\llama\llama\model.py", line 225, in forward h = self.tok_embeddings(tokens) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\layers.py", line 214, in forward output = gather_from_model_parallel_region(output_parallel) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 156, in gather_from_model_parallel_region return _GatherFromModelParallelRegion.apply(input_) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 131, in forward return _gather(input_) File "C:\Users\sunbi\Anaconda3\lib\site-packages\fairscale\nn\model_parallel\mappings.py", line 82, in _gather torch.distributed.all_gather(tensor_list, input_, group=group) File "C:\Users\sunbi\Anaconda3\lib\site-packages\torch\distributed\distributed_c10d.py", line 2282, in all_gather work.wait() RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See https://github.com/pytorch/rfcs/pull/17 for more details.

I found the same error in stackoverflow:
https://stackoverflow.com/questions/71223747/pytorch-error-when-modifying-unpacked-tensor

I changed
logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
into
temp = tokens[:, prev_pos:cur_pos].clone()
logits = self.model.forward(temp, prev_pos)
But the problem remains, I'm new to python, could you have a look at it?

environment:
intel i7 8700k
32gb of ram
2 tesla P40 GPUs(24GB video memory each)
win11
python: 3.7
torch version: 1.13.1+cu117
cuda version: 11.7

@Pliman
Copy link

Pliman commented Mar 12, 2023

I fixed it in meta-llama/llama#180

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment