HamidShojanazeri/gist:cba001faa8471d81ba3032f42b30f703 Secret

## gistfile1.txt
(pytorch_from_src_brian_pr) hamidnazeri@a100-st-p4d24xlarge-3:~/PiPPy/examples/inference/tp-llama$ torchrun --nnodes 1 --nproc_per_node 2 chat_example.py  --model_args model_args.json --tokenizer_path /data/home/hamidnazeri/PiPPy/examples/inference/model/models--meta-llama--Llama-2-13b-chat/snapshots/8ebc6a0ac2e4a781c31cb4ad395b1c26c5158c76/tokenizer.model --converted_ckpt_dir converted_checkpoints
[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING]
[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING] *****************************************
[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING] *****************************************
/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
  _C._set_default_tensor_type(t)
/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
  _C._set_default_tensor_type(t)
time to load the model 37.00245766097214s
******************************************
time to load the model 37.002472958061844s
******************************************
time for preprocessing of the chat examples is 0.0005259180907160044 s
=======================================================================
time for preprocessing of the chat examples is 0.00055427395273 s
=======================================================================
[[528, 267, 13242, 29871, 3773, 2576, 832, 22874, 297, 29889]]
total time profiles is [22.59880348399747]
len batch 1
total avg inference time22.59880348399747s for total tokens of 10 and tokens per second 0.442501303534992
Bandwidth achieved: 11.52 GB/s
=======================================================================
User: what is the recipe of mayonnaise?

> Assistant: shesopol  awlim inst compens in.

==================================

[[8224, 267, 29889, 29892, 1450, 520, 304, 1171, 29901, 29883]]
total time profiles is [22.59822693699971]
len batch 1
total avg inference time22.59822693699971s for total tokens of 10 and tokens per second 0.4425125930400833
Bandwidth achieved: 11.52 GB/s
=======================================================================
Memory used: 17.13 GB
User: what is the recipe of mayonnaise?

> Assistant: Koes.,awost toman:c

==================================
	(pytorch_from_src_brian_pr) hamidnazeri@a100-st-p4d24xlarge-3:~/PiPPy/examples/inference/tp-llama$ torchrun --nnodes 1 --nproc_per_node 2 chat_example.py --model_args model_args.json --tokenizer_path /data/home/hamidnazeri/PiPPy/examples/inference/model/models--meta-llama--Llama-2-13b-chat/snapshots/8ebc6a0ac2e4a781c31cb4ad395b1c26c5158c76/tokenizer.model --converted_ckpt_dir converted_checkpoints
	[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING]
	[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING] *****************************************
	[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
	[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING] *****************************************
	/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
	_C._set_default_tensor_type(t)
	/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
	_C._set_default_tensor_type(t)
	time to load the model 37.00245766097214s
	******************************************
	time to load the model 37.002472958061844s
	******************************************
	time for preprocessing of the chat examples is 0.0005259180907160044 s
	=======================================================================
	time for preprocessing of the chat examples is 0.00055427395273 s
	=======================================================================
	[[528, 267, 13242, 29871, 3773, 2576, 832, 22874, 297, 29889]]
	total time profiles is [22.59880348399747]
	len batch 1
	total avg inference time22.59880348399747s for total tokens of 10 and tokens per second 0.442501303534992
	Bandwidth achieved: 11.52 GB/s
	=======================================================================
	User: what is the recipe of mayonnaise?

	> Assistant: shesopol awlim inst compens in.

	==================================

	[[8224, 267, 29889, 29892, 1450, 520, 304, 1171, 29901, 29883]]
	total time profiles is [22.59822693699971]
	len batch 1
	total avg inference time22.59822693699971s for total tokens of 10 and tokens per second 0.4425125930400833
	Bandwidth achieved: 11.52 GB/s
	=======================================================================
	Memory used: 17.13 GB
	User: what is the recipe of mayonnaise?

	> Assistant: Koes.,awost toman:c

	==================================