Skip to content

Instantly share code, notes, and snippets.

@HamidShojanazeri
Created October 9, 2023 18:59
Show Gist options
  • Save HamidShojanazeri/cba001faa8471d81ba3032f42b30f703 to your computer and use it in GitHub Desktop.
Save HamidShojanazeri/cba001faa8471d81ba3032f42b30f703 to your computer and use it in GitHub Desktop.
(pytorch_from_src_brian_pr) hamidnazeri@a100-st-p4d24xlarge-3:~/PiPPy/examples/inference/tp-llama$ torchrun --nnodes 1 --nproc_per_node 2 chat_example.py --model_args model_args.json --tokenizer_path /data/home/hamidnazeri/PiPPy/examples/inference/model/models--meta-llama--Llama-2-13b-chat/snapshots/8ebc6a0ac2e4a781c31cb4ad395b1c26c5158c76/tokenizer.model --converted_ckpt_dir converted_checkpoints
[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING]
[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING] *****************************************
[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2023-10-09 18:54:44,843] torch.distributed.run: [WARNING] *****************************************
/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
_C._set_default_tensor_type(t)
/opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/__init__.py:632: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /opt/hpcaas/.mounts/fs-5c62ddab/home/hamidnazeri/pytorch/torch/csrc/tensor/python_tensor.cpp:450.)
_C._set_default_tensor_type(t)
time to load the model 37.00245766097214s
******************************************
time to load the model 37.002472958061844s
******************************************
time for preprocessing of the chat examples is 0.0005259180907160044 s
=======================================================================
time for preprocessing of the chat examples is 0.00055427395273 s
=======================================================================
[[528, 267, 13242, 29871, 3773, 2576, 832, 22874, 297, 29889]]
total time profiles is [22.59880348399747]
len batch 1
total avg inference time22.59880348399747s for total tokens of 10 and tokens per second 0.442501303534992
Bandwidth achieved: 11.52 GB/s
=======================================================================
User: what is the recipe of mayonnaise?
> Assistant: shesopol awlim inst compens in.
==================================
[[8224, 267, 29889, 29892, 1450, 520, 304, 1171, 29901, 29883]]
total time profiles is [22.59822693699971]
len batch 1
total avg inference time22.59822693699971s for total tokens of 10 and tokens per second 0.4425125930400833
Bandwidth achieved: 11.52 GB/s
=======================================================================
Memory used: 17.13 GB
User: what is the recipe of mayonnaise?
> Assistant: Koes.,awost toman:c
==================================
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment