HamidShojanazeri/gist:01e5b03a19efc00c7ac52511b7ba7b70 Secret

## gistfile1.txt
(pytorch_from_src_brian_pr) hamidnazeri@a100-st-p4d24xlarge-3:~/PiPPy/examples/inference/tp-llama$ torchrun --nnodes 1 --nproc_per_node 2 chat_example.py  --model_args model_args.json --tokenizer_path /data/home/hamidnazeri/PiPPy/examples/inference/model/models--meta-llama--Llama-2-13b-chat/snapshots/8ebc6a0ac2e4a781c31cb4ad395b1c26c5158c76/tokenizer.model --converted_ckpt_dir converted_checkpoints
[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING]
[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING] *****************************************
[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING] *****************************************
/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
  _C._set_default_tensor_type(t)
/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
  _C._set_default_tensor_type(t)
time to load the model 37.89913112903014s
******************************************
time to load the model 37.901050213957205s
******************************************
time for preprocessing of the chat examples is 0.0005440110107883811 stime for preprocessing of the chat examples is 0.0005576059920713305 s

==============================================================================================================================================

[[29871, 18585, 29991, 2266, 29915, 29879, 263, 22037, 9522, 412]]
total time profiles is [22.72014075401239]
len batch 1
total avg inference time22.72014075401239s for total tokens of 10 and tokens per second 0.44013811834479916
Bandwidth achieved: 11.46 GB/s
=======================================================================
User: what is the recipe of mayonnaise?

> Assistant:  Sure! Here's a classic recipe

==================================

[[29871, 18585, 29991, 2266, 29915, 29879, 263, 6996, 9522, 412]]
total time profiles is [22.720830696984194]
len batch 1
total avg inference time22.720830696984194s for total tokens of 10 and tokens per second 0.44012475306755977
Bandwidth achieved: 11.46 GB/s
=======================================================================
Memory used: 17.11 GB
User: what is the recipe of mayonnaise?

> Assistant:  Sure! Here's a basic recipe

==================================
	(pytorch_from_src_brian_pr) hamidnazeri@a100-st-p4d24xlarge-3:~/PiPPy/examples/inference/tp-llama$ torchrun --nnodes 1 --nproc_per_node 2 chat_example.py --model_args model_args.json --tokenizer_path /data/home/hamidnazeri/PiPPy/examples/inference/model/models--meta-llama--Llama-2-13b-chat/snapshots/8ebc6a0ac2e4a781c31cb4ad395b1c26c5158c76/tokenizer.model --converted_ckpt_dir converted_checkpoints
	[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING]
	[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING] *****************************************
	[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
	[2023-10-09 18:24:08,755] torch.distributed.run: [WARNING] *****************************************
	/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
	_C._set_default_tensor_type(t)
	/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
	_C._set_default_tensor_type(t)
	time to load the model 37.89913112903014s
	******************************************
	time to load the model 37.901050213957205s
	******************************************
	time for preprocessing of the chat examples is 0.0005440110107883811 stime for preprocessing of the chat examples is 0.0005576059920713305 s

	==============================================================================================================================================

	[[29871, 18585, 29991, 2266, 29915, 29879, 263, 22037, 9522, 412]]
	total time profiles is [22.72014075401239]
	len batch 1
	total avg inference time22.72014075401239s for total tokens of 10 and tokens per second 0.44013811834479916
	Bandwidth achieved: 11.46 GB/s
	=======================================================================
	User: what is the recipe of mayonnaise?

	> Assistant: Sure! Here's a classic recipe

	==================================

	[[29871, 18585, 29991, 2266, 29915, 29879, 263, 6996, 9522, 412]]
	total time profiles is [22.720830696984194]
	len batch 1
	total avg inference time22.720830696984194s for total tokens of 10 and tokens per second 0.44012475306755977
	Bandwidth achieved: 11.46 GB/s
	=======================================================================
	Memory used: 17.11 GB
	User: what is the recipe of mayonnaise?

	> Assistant: Sure! Here's a basic recipe

	==================================