You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/home/amoran/optimum-tpu/alvaro/tuning_gemma2b.py", line 50, in <module>
trainer = Trainer(
File "/home/amoran/Dev/venv/hf/lib/python3.10/site-packages/transformers/trainer.py", line 659, in __init__
xs.set_global_mesh(xs.Mesh(np.array(range(num_devices)), (num_devices, 1), axis_names=("fsdp", "tensor")))
AttributeError: module 'torch_xla.distributed.spmd' has no attribute 'set_global_mesh'
Results in torch_xla 2.3.0
0%| | 0/100 [00:00<?, ?it/s]Exception in thread Thread-3 (_loader_worker):
Traceback (most recent call last):
File "/home/amoran/Dev/venv/hf/lib/python3.10/site-packages/accelerate/data_loader.py", line 464, in __iter__
There seems to be not a single sample in your epoch_iterator, stopping training at step 0! This is expected if you're using an IterableDataset and set num_steps (100) higher than the number of available samples.
next_batch = next(dataloader_iter)
File "/home/amoran/Dev/venv/hf/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
data = self._next_data()
File "/home/amoran/Dev/venv/hf/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
index = self._next_index() # may raise StopIteration
File "/home/amoran/Dev/venv/hf/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 621, in _next_index
return next(self._sampler_iter) # may raise StopIteration
{'train_runtime': 0.0044, 'train_samples_per_second': 0.0, 'train_steps_per_second': 22822.418, 'train_loss': 0.0, 'epoch': 0}
StopIteration
During handling of the above exception, another exception occurred:
0%| | 0/100 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
0%| | 0/100 [00:00<?, ?it/s]
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/amoran/Dev/venv/hf/lib/python3.10/site-packages/torch_xla/distributed/parallel_loader.py", line 152, in _loader_worker
_, data = next(data_iter)
File "/home/amoran/Dev/venv/hf/lib/python3.10/site-packages/accelerate/data_loader.py", line 472, in __iter__
yield current_batch
UnboundLocalError: local variable 'current_batch' referenced before assignment
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters