xeb/run_out_and_errs.txt

## run_out_and_errs.txt


-----
2023-06-30 23:25:38.764871: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-30 23:25:38.925571: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            209-20-159-225
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           209-20-159-225
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/computation/expressions.py:20: UserWarning: Pandas requires version '2.7.3' or newer of 'numexpr' (version '2.7.1' currently installed).
  from pandas.core.computation.check import NUMEXPR_INSTALLED
Loading tokenizer...

Loading checkpoint shards:   0%|                                                                                                                                           | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards:  11%|██████████████▌                                                                                                                    | 1/9 [00:06<00:49,  6.25s/it]
Loading checkpoint shards:  22%|█████████████████████████████                                                                                                      | 2/9 [00:11<00:40,  5.77s/it]
Loading checkpoint shards:  33%|███████████████████████████████████████████▋                                                                                       | 3/9 [00:17<00:33,  5.63s/it]
Loading checkpoint shards:  44%|██████████████████████████████████████████████████████████▏                                                                        | 4/9 [00:22<00:27,  5.56s/it]
Loading checkpoint shards:  56%|████████████████████████████████████████████████████████████████████████▊                                                          | 5/9 [00:28<00:22,  5.51s/it]
Loading checkpoint shards:  67%|███████████████████████████████████████████████████████████████████████████████████████▎                                           | 6/9 [00:33<00:16,  5.49s/it]
Loading checkpoint shards:  78%|█████████████████████████████████████████████████████████████████████████████████████████████████████▉                             | 7/9 [00:38<00:10,  5.47s/it]
Loading checkpoint shards:  89%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍              | 8/9 [00:44<00:05,  5.47s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:48<00:00,  5.08s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:48<00:00,  5.40s/it]
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.0.1)
    Python  3.8.16 (you have 3.8.10)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
/home/ubuntu/.local/lib/python3.8/site-packages/transformers/generation/utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Traceback (most recent call last):
  File "run.py", line 20, in <module>
    sequences = pipeline(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/pipelines/text_generation.py", line 201, in __call__
    return super().__call__(text_inputs, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1120, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1127, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1026, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/pipelines/text_generation.py", line 263, in _forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/usr/lib/python3/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/generation/utils.py", line 1572, in generate
    return self.sample(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/generation/utils.py", line 2619, in sample
    outputs = self(
  File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/modelling_RW.py", line 759, in forward
    transformer_outputs = self.transformer(
  File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/modelling_RW.py", line 654, in forward
    outputs = block(
  File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/modelling_RW.py", line 396, in forward
    attn_outputs = self.self_attention(
  File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/modelling_RW.py", line 252, in forward
    fused_qkv = self.query_key_value(hidden_states)  # [batch_size, seq_length, 3 x hidden_size]
  File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/modelling_RW.py", line 32, in forward
    ret = input @ self.weight.T
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
-----


Output of `nvidi-smi`:
-----
Fri Jun 30 23:31:15 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA H100 PCIe    On   | 00000000:06:00.0 Off |                    0 |
| N/A   34C    P0    49W / 350W |      0MiB / 81559MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
-----

Output of `nvcc -V`:
-----
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
-----


	-----
	2023-06-30 23:25:38.764871: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI
	To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
	2023-06-30 23:25:38.925571: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
	--------------------------------------------------------------------------
	WARNING: No preset parameters were found for the device that Open MPI
	detected:

	Local host: 209-20-159-225
	Device name: mlx5_0
	Device vendor ID: 0x02c9
	Device vendor part ID: 4122

	Default device parameters will be used, which may result in lower
	performance. You can edit any of the files specified by the
	btl_openib_device_param_files MCA parameter to set values for your
	device.

	NOTE: You can turn off this warning by setting the MCA parameter
	btl_openib_warn_no_device_params_found to 0.
	--------------------------------------------------------------------------
	--------------------------------------------------------------------------
	No OpenFabrics connection schemes reported that they were able to be
	used on a specific port. As such, the openib BTL (OpenFabrics
	support) will be disabled for this port.

	Local host: 209-20-159-225
	Local device: mlx5_0
	Local port: 1
	CPCs attempted: udcm
	--------------------------------------------------------------------------
	/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/computation/expressions.py:20: UserWarning: Pandas requires version '2.7.3' or newer of 'numexpr' (version '2.7.1' currently installed).
	from pandas.core.computation.check import NUMEXPR_INSTALLED
	Loading tokenizer...

	Loading checkpoint shards: 0%\| \| 0/9 [00:00<?, ?it/s]
	Loading checkpoint shards: 11%\|██████████████▌ \| 1/9 [00:06<00:49, 6.25s/it]
	Loading checkpoint shards: 22%\|█████████████████████████████ \| 2/9 [00:11<00:40, 5.77s/it]
	Loading checkpoint shards: 33%\|███████████████████████████████████████████▋ \| 3/9 [00:17<00:33, 5.63s/it]
	Loading checkpoint shards: 44%\|██████████████████████████████████████████████████████████▏ \| 4/9 [00:22<00:27, 5.56s/it]
	Loading checkpoint shards: 56%\|████████████████████████████████████████████████████████████████████████▊ \| 5/9 [00:28<00:22, 5.51s/it]
	Loading checkpoint shards: 67%\|███████████████████████████████████████████████████████████████████████████████████████▎ \| 6/9 [00:33<00:16, 5.49s/it]
	Loading checkpoint shards: 78%\|█████████████████████████████████████████████████████████████████████████████████████████████████████▉ \| 7/9 [00:38<00:10, 5.47s/it]
	Loading checkpoint shards: 89%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ \| 8/9 [00:44<00:05, 5.47s/it]
	Loading checkpoint shards: 100%\|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 9/9 [00:48<00:00, 5.08s/it]
	Loading checkpoint shards: 100%\|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 9/9 [00:48<00:00, 5.40s/it]
	WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
	PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.0.1)
	Python 3.8.16 (you have 3.8.10)
	Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
	Memory-efficient attention, SwiGLU, sparse and more won't be available.
	Set XFORMERS_MORE_DETAILS=1 for more details
	The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
	/home/ubuntu/.local/lib/python3.8/site-packages/transformers/generation/utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
	warnings.warn(
	Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
	Traceback (most recent call last):
	File "run.py", line 20, in <module>
	sequences = pipeline(
	File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/pipelines/text_generation.py", line 201, in __call__
	return super().__call__(text_inputs, **kwargs)
	File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1120, in __call__
	return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
	File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1127, in run_single
	model_outputs = self.forward(model_inputs, **forward_params)
	File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1026, in forward
	model_outputs = self._forward(model_inputs, **forward_params)
	File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/pipelines/text_generation.py", line 263, in _forward
	generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
	File "/usr/lib/python3/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
	return func(args, *kwargs)
	File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/generation/utils.py", line 1572, in generate
	return self.sample(
	File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/generation/utils.py", line 2619, in sample
	outputs = self(
	File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
	return forward_call(args, *kwargs)
	File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
	output = old_forward(args, *kwargs)
	File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/modelling_RW.py", line 759, in forward
	transformer_outputs = self.transformer(
	File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
	return forward_call(args, *kwargs)
	File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
	output = old_forward(args, *kwargs)
	File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/modelling_RW.py", line 654, in forward
	outputs = block(
	File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
	return forward_call(args, *kwargs)
	File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
	output = old_forward(args, *kwargs)
	File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/modelling_RW.py", line 396, in forward
	attn_outputs = self.self_attention(
	File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
	return forward_call(args, *kwargs)
	File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
	output = old_forward(args, *kwargs)
	File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/modelling_RW.py", line 252, in forward
	fused_qkv = self.query_key_value(hidden_states) # [batch_size, seq_length, 3 x hidden_size]
	File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
	return forward_call(args, *kwargs)
	File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
	output = old_forward(args, *kwargs)
	File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-40b-instruct/1e7fdcc9f45d13704f3826e99937917e007cd975/modelling_RW.py", line 32, in forward
	ret = input @ self.weight.T
	RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
	-----


	Output of `nvidi-smi`:
	-----
	Fri Jun 30 23:31:15 2023
	+-----------------------------------------------------------------------------+
	\| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 \|
	\|-------------------------------+----------------------+----------------------+
	\| GPU Name Persistence-M\| Bus-Id Disp.A \| Volatile Uncorr. ECC \|
	\| Fan Temp Perf Pwr:Usage/Cap\| Memory-Usage \| GPU-Util Compute M. \|
	\| \| \| MIG M. \|
	\|===============================+======================+======================\|
	\| 0 NVIDIA H100 PCIe On \| 00000000:06:00.0 Off \| 0 \|
	\| N/A 34C P0 49W / 350W \| 0MiB / 81559MiB \| 0% Default \|
	\| \| \| Disabled \|
	+-------------------------------+----------------------+----------------------+

	+-----------------------------------------------------------------------------+
	\| Processes: \|
	\| GPU GI CI PID Type Process name GPU Memory \|
	\| ID ID Usage \|
	\|=============================================================================\|
	\| No running processes found \|
	+-----------------------------------------------------------------------------+
	-----

	Output of `nvcc -V`:
	-----
	nvcc: NVIDIA (R) Cuda compiler driver
	Copyright (c) 2005-2022 NVIDIA Corporation
	Built on Wed_Sep_21_10:33:58_PDT_2022
	Cuda compilation tools, release 11.8, V11.8.89
	Build cuda_11.8.r11.8/compiler.31833905_0
	-----