Skip to content

Instantly share code, notes, and snippets.

@HamidShojanazeri
Created October 9, 2023 18:50
Show Gist options
  • Save HamidShojanazeri/01e5b03a19efc00c7ac52511b7ba7b70 to your computer and use it in GitHub Desktop.
Save HamidShojanazeri/01e5b03a19efc00c7ac52511b7ba7b70 to your computer and use it in GitHub Desktop.
(pytorch_from_src_brian_pr) hamidnazeri@a100-st-p4d24xlarge-3:~/PiPPy/examples/inference/tp-llama$ torchrun --nnodes 1 --nproc_per_node 2 chat_example.py --model_args model_args.json --tokenizer_path /data/home/hamidnazeri/PiPPy/examples/inference/model/models--meta-llama--Llama-2-13b-chat/snapshots/8ebc6a0ac2e4a781c31cb4ad395b1c26c5158c76/tokenizer.model --converted_ckpt_dir converted_checkpoints
[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING]
[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING] *****************************************
[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING] *****************************************
/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
_C._set_default_tensor_type(t)
/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
_C._set_default_tensor_type(t)
time to load the model 37.89913112903014s
******************************************
time to load the model 37.901050213957205s
******************************************
time for preprocessing of the chat examples is 0.0005440110107883811 stime for preprocessing of the chat examples is 0.0005576059920713305 s
==============================================================================================================================================
[[29871, 18585, 29991, 2266, 29915, 29879, 263, 22037, 9522, 412]]
total time profiles is [22.72014075401239]
len batch 1
total avg inference time22.72014075401239s for total tokens of 10 and tokens per second 0.44013811834479916
Bandwidth achieved: 11.46 GB/s
=======================================================================
User: what is the recipe of mayonnaise?
> Assistant: Sure! Here's a classic recipe
==================================
[[29871, 18585, 29991, 2266, 29915, 29879, 263, 6996, 9522, 412]]
total time profiles is [22.720830696984194]
len batch 1
total avg inference time22.720830696984194s for total tokens of 10 and tokens per second 0.44012475306755977
Bandwidth achieved: 11.46 GB/s
=======================================================================
Memory used: 17.11 GB
User: what is the recipe of mayonnaise?
> Assistant: Sure! Here's a basic recipe
==================================
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment