Skip to content

Instantly share code, notes, and snippets.

@ezyang
Created November 12, 2022 18:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ezyang/9de8e745291474ef3df59b62cbf71796 to your computer and use it in GitHub Desktop.
Save ezyang/9de8e745291474ef3df59b62cbf71796 to your computer and use it in GitHub Desktop.
Sweep logs for symbolic-shapes --accuracy --backend aot_eager --training (TORCHDYNAMO_DYNAMIC_SHAPES=1) - 807a62fc61bea26707c3dc09a12bad204e375a95 Sat Nov 12 16:49:04 UTC 2022
Running torchbench.py BERT_pytorch...
cuda train BERT_pytorch PASS
Running torchbench.py Background_Matting...
[2022-11-12 16:50:12,257] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 16:50:17,079] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train Background_Matting PASS
sampling loop time step: 0%| | 0/1 [00:00<?, ?it/s] sampling loop time step: 100%|██████████| 1/1 [00:00<00:00, 55.09it/s]
0it [00:00, ?it/s]
sampling loop time step: 0%| | 0/1 [00:00<?, ?it/s]
sampling loop time step: 100%|██████████| 1/1 [00:01<00:00, 1.14s/it] sampling loop time step: 100%|██████████| 1/1 [00:01<00:00, 1.14s/it]
1it [00:02, 2.01s/it]
sampling loop time step: 0%| | 0/1 [00:00<?, ?it/s] sampling loop time step: 100%|██████████| 1/1 [00:00<00:00, 16.65it/s]
2it [00:02, 1.37s/it] 2it [00:02, 1.47s/it]
WARNING:root:DALLE2_pytorch failed to load
Eager model failed to run
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 983, in validate_model
self.model_iter_fn(model, example_inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 337, in forward_and_backward_pass
self.grad_scaler.scale(loss).backward()
File "/scratch/ezyang/work/pytorch/torch/_tensor.py", line 473, in backward
torch.autograd.backward(
File "/scratch/ezyang/work/pytorch/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1893, in run
device, name, model, example_inputs, batch_size = runner.load_model(
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 282, in load_model
self.validate_model(model, example_inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 985, in validate_model
raise NotImplementedError("Eager model failed to run")
NotImplementedError: Eager model failed to run
Running torchbench.py LearningToPaint...
[2022-11-12 16:50:45,757] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 16:50:47,613] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train LearningToPaint PASS
[2022-11-12 16:50:56,537] torch._dynamo.testing: [WARNING] High loss value alert - 12.64. Can result in unstable gradients.
Running torchbench.py Super_SloMo...
[2022-11-12 16:50:58,053] torch._dynamo.testing: [WARNING] High loss value alert - 12.64. Can result in unstable gradients.
[2022-11-12 16:50:58,162] torch._dynamo.testing: [WARNING] High loss value alert - 12.64. Can result in unstable gradients.
[2022-11-12 16:50:58,261] torch._dynamo.testing: [WARNING] High loss value alert - 12.64. Can result in unstable gradients.
[2022-11-12 16:53:38,843] torch._dynamo.testing: [WARNING] High loss value alert - 12.64. Can result in unstable gradients.
cuda train Super_SloMo PASS
Running torchbench.py alexnet...
cuda train alexnet PASS
Running torchbench.py attention_is_all_you_need_pytorch...
cuda train attention_is_all_you_need_pytorch PASS
Running torchbench.py dcgan...
[2022-11-12 16:54:54,404] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 16:54:55,066] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train dcgan PASS
Running torchbench.py densenet121...
[2022-11-12 16:55:16,761] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train densenet121 PASS
WARNING:root:detectron2_fcos_r_50_fpn failed to load
FCOS train is not supported by upstream detectron2. See GH Issue: https://github.com/facebookresearch/detectron2/issues/4369.
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1893, in run
device, name, model, example_inputs, batch_size = runner.load_model(
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 251, in load_model
benchmark = benchmark_cls(
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/util/model.py", line 18, in __call__
obj = type.__call__(cls, *args, **kwargs)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/detectron2_fcos_r_50_fpn/__init__.py", line 15, in __init__
super().__init__(variant="COCO-Detection/fcos_R_50_FPN_1x.py", test=test, device=device,
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/util/framework/detectron2/model_factory.py", line 100, in __init__
loader = self.setup_train(cfg, args)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/util/framework/detectron2/model_factory.py", line 110, in setup_train
raise NotImplementedError("FCOS train is not supported by upstream detectron2. " \
NotImplementedError: FCOS train is not supported by upstream detectron2. See GH Issue: https://github.com/facebookresearch/detectron2/issues/4369.
WARNING:root:detectron2_maskrcnn_r_50_c4 failed to load
Eager model failed to run
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 983, in validate_model
self.model_iter_fn(model, example_inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 336, in forward_and_backward_pass
loss = self.compute_loss(pred)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 326, in compute_loss
return reduce_to_scalar_loss(pred)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/testing.py", line 83, in reduce_to_scalar_loss
return sum([reduce_to_scalar_loss(x) for x in out]) / len(out)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/testing.py", line 83, in <listcomp>
return sum([reduce_to_scalar_loss(x) for x in out]) / len(out)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/testing.py", line 93, in reduce_to_scalar_loss
return sum([reduce_to_scalar_loss(value) for value in out.values()]) / len(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/testing.py", line 93, in <listcomp>
return sum([reduce_to_scalar_loss(value) for value in out.values()]) / len(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/testing.py", line 96, in reduce_to_scalar_loss
raise NotImplementedError("Don't know how to reduce", type(out))
NotImplementedError: ("Don't know how to reduce", <class 'detectron2.structures.instances.Instances'>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1893, in run
device, name, model, example_inputs, batch_size = runner.load_model(
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 282, in load_model
self.validate_model(model, example_inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 985, in validate_model
raise NotImplementedError("Eager model failed to run")
NotImplementedError: Eager model failed to run
Running torchbench.py dlrm...
incomplete graph:
class joint_forward_backward(torch.nn.Module):
def forward(self, primals, tangents):
primals_1: f32[512, 512], [512, 1], primals_2: f32[512], [1], primals_3: f32[64, 512], [512, 1], primals_4: f32[64], [1], primals_5: f32[1000000, 64], [64, 1], primals_6: f32[1000000, 64], [64, 1], primals_7: f32[1000000, 64], [64, 1], primals_8: f32[1000000, 64], [64, 1], primals_9: f32[1000000, 64], [64, 1], primals_10: f32[1000000, 64], [64, 1], primals_11: f32[1000000, 64], [64, 1], primals_12: f32[1000000, 64], [64, 1], primals_13: f32[1024, 100], [100, 1], primals_14: f32[1024], [1], primals_15: f32[1024, 1024], [1024, 1], primals_16: f32[1024], [1], primals_17: f32[1024, 1024], [1024, 1], primals_18: f32[1024], [1], primals_19: f32[1, 1024], [1024, 1], primals_20: f32[1], [1], primals_21: f32[s0, 512], [512, 1], primals_22: i64[s2, s0], [s0, 1], primals_23: i64[s3], [1], primals_24: i64[s3], [1], primals_25: i64[s3], [1], primals_26: i64[s3], [1], primals_27: i64[s3], [1], primals_28: i64[s3], [1], primals_29: i64[s3], [1], primals_30: i64[s3], [1], tangents_1: f32[s0, 1], [1, 1], = fx_pytree.tree_flatten_spec([primals, tangents], self._in_spec)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
t: f32[512, 512], [1, 512] = torch.ops.aten.t.default(primals_1); primals_1 = None
addmm: f32[s0, 512], [512, 1] = torch.ops.aten.addmm.default(primals_2, primals_21, t); primals_2 = t = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
relu: f32[s0, 512], [512, 1] = torch.ops.aten.relu.default(addmm); addmm = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
t_1: f32[512, 64], [1, 512] = torch.ops.aten.t.default(primals_3); primals_3 = None
addmm_1: f32[s0, 64], [64, 1] = torch.ops.aten.addmm.default(primals_4, relu, t_1); primals_4 = relu = t_1 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
relu_1: f32[s0, 64], [64, 1] = torch.ops.aten.relu.default(addmm_1); addmm_1 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
select: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 0)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
_embedding_bag = torch.ops.aten._embedding_bag.default(primals_5, primals_23, select, False, 0, True, None); primals_5 = primals_23 = select = None
getitem: f32[s0, 64], [64, 1] = _embedding_bag[0]
getitem_1: i64[s3], [1] = _embedding_bag[1]
getitem_2: i64[s0], [1] = _embedding_bag[2]
getitem_3: i64[0], [1] = _embedding_bag[3]; _embedding_bag = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
select_1: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 1)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
_embedding_bag_1 = torch.ops.aten._embedding_bag.default(primals_6, primals_24, select_1, False, 0, True, None); primals_6 = primals_24 = select_1 = None
getitem_4: f32[s0, 64], [64, 1] = _embedding_bag_1[0]
getitem_5: i64[s3], [1] = _embedding_bag_1[1]
getitem_6: i64[s0], [1] = _embedding_bag_1[2]
getitem_7: i64[0], [1] = _embedding_bag_1[3]; _embedding_bag_1 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
select_2: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 2)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
_embedding_bag_2 = torch.ops.aten._embedding_bag.default(primals_7, primals_25, select_2, False, 0, True, None); primals_7 = primals_25 = select_2 = None
getitem_8: f32[s0, 64], [64, 1] = _embedding_bag_2[0]
getitem_9: i64[s3], [1] = _embedding_bag_2[1]
getitem_10: i64[s0], [1] = _embedding_bag_2[2]
getitem_11: i64[0], [1] = _embedding_bag_2[3]; _embedding_bag_2 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
select_3: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 3)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
_embedding_bag_3 = torch.ops.aten._embedding_bag.default(primals_8, primals_26, select_3, False, 0, True, None); primals_8 = primals_26 = select_3 = None
getitem_12: f32[s0, 64], [64, 1] = _embedding_bag_3[0]
getitem_13: i64[s3], [1] = _embedding_bag_3[1]
getitem_14: i64[s0], [1] = _embedding_bag_3[2]
getitem_15: i64[0], [1] = _embedding_bag_3[3]; _embedding_bag_3 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
select_4: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 4)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
_embedding_bag_4 = torch.ops.aten._embedding_bag.default(primals_9, primals_27, select_4, False, 0, True, None); primals_9 = primals_27 = select_4 = None
getitem_16: f32[s0, 64], [64, 1] = _embedding_bag_4[0]
getitem_17: i64[s3], [1] = _embedding_bag_4[1]
getitem_18: i64[s0], [1] = _embedding_bag_4[2]
getitem_19: i64[0], [1] = _embedding_bag_4[3]; _embedding_bag_4 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
select_5: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 5)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
_embedding_bag_5 = torch.ops.aten._embedding_bag.default(primals_10, primals_28, select_5, False, 0, True, None); primals_10 = primals_28 = select_5 = None
getitem_20: f32[s0, 64], [64, 1] = _embedding_bag_5[0]
getitem_21: i64[s3], [1] = _embedding_bag_5[1]
getitem_22: i64[s0], [1] = _embedding_bag_5[2]
getitem_23: i64[0], [1] = _embedding_bag_5[3]; _embedding_bag_5 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
select_6: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 6)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
_embedding_bag_6 = torch.ops.aten._embedding_bag.default(primals_11, primals_29, select_6, False, 0, True, None); primals_11 = primals_29 = select_6 = None
getitem_24: f32[s0, 64], [64, 1] = _embedding_bag_6[0]
getitem_25: i64[s3], [1] = _embedding_bag_6[1]
getitem_26: i64[s0], [1] = _embedding_bag_6[2]
getitem_27: i64[0], [1] = _embedding_bag_6[3]; _embedding_bag_6 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
select_7: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 7); primals_22 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
_embedding_bag_7 = torch.ops.aten._embedding_bag.default(primals_12, primals_30, select_7, False, 0, True, None); primals_12 = select_7 = None
getitem_28: f32[s0, 64], [64, 1] = _embedding_bag_7[0]
getitem_29: i64[s3], [1] = _embedding_bag_7[1]
getitem_30: i64[s0], [1] = _embedding_bag_7[2]
getitem_31: i64[0], [1] = _embedding_bag_7[3]; _embedding_bag_7 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:306, code: T = torch.cat([x] + ly, dim=1).view((batch_size, -1, d))
cat: f32[s0, 576], [576, 1] = torch.ops.aten.cat.default([relu_1, getitem, getitem_4, getitem_8, getitem_12, getitem_16, getitem_20, getitem_24, getitem_28], 1); getitem = getitem_4 = getitem_8 = getitem_12 = getitem_16 = getitem_20 = getitem_24 = getitem_28 = None
sym_size: Sym(s0) = torch.ops.aten.sym_size(primals_21, 0); primals_21 = None
view: f32[s0, 9, 64], [576, 64, 1] = torch.ops.aten.view.default(cat, [sym_size, -1, 64]); cat = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:308, code: Z = torch.bmm(T, torch.transpose(T, 1, 2))
transpose: f32[s0, 64, 9], [576, 1, 64] = torch.ops.aten.transpose.int(view, 1, 2)
bmm: f32[s0, 9, 9], [81, 9, 1] = torch.ops.aten.bmm.default(view, transpose)
# No stacktrace found for following nodes
_tensor_constant0 = self._tensor_constant0
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:319, code: li = torch.tensor([i for i in range(ni) for j in range(i + offset)], device=x.device)
lift_fresh_copy: i64[36], [1] = torch.ops.aten.lift_fresh_copy.default(_tensor_constant0); _tensor_constant0 = None
# No stacktrace found for following nodes
_tensor_constant1 = self._tensor_constant1
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:320, code: lj = torch.tensor([j for i in range(nj) for j in range(i + offset)], device=x.device)
lift_fresh_copy_1: i64[36], [1] = torch.ops.aten.lift_fresh_copy.default(_tensor_constant1); _tensor_constant1 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:321, code: Zflat = Z[:, li, lj]
slice_1: f32[s0, 9, 9], [81, 9, 1] = torch.ops.aten.slice.Tensor(bmm, 0, 0, 9223372036854775807); bmm = None
index: f32[s0, 36], [36, 1] = torch.ops.aten.index.Tensor(slice_1, [None, lift_fresh_copy, lift_fresh_copy_1])
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:323, code: R = torch.cat([x] + [Zflat], dim=1)
cat_1: f32[s0, 100], [100, 1] = torch.ops.aten.cat.default([relu_1, index], 1); relu_1 = index = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
t_2: f32[100, 1024], [1, 100] = torch.ops.aten.t.default(primals_13); primals_13 = None
addmm_2: f32[s0, 1024], [1024, 1] = torch.ops.aten.addmm.default(primals_14, cat_1, t_2); primals_14 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
relu_2: f32[s0, 1024], [1024, 1] = torch.ops.aten.relu.default(addmm_2); addmm_2 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
t_3: f32[1024, 1024], [1, 1024] = torch.ops.aten.t.default(primals_15); primals_15 = None
addmm_3: f32[s0, 1024], [1024, 1] = torch.ops.aten.addmm.default(primals_16, relu_2, t_3); primals_16 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
relu_3: f32[s0, 1024], [1024, 1] = torch.ops.aten.relu.default(addmm_3); addmm_3 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
t_4: f32[1024, 1024], [1, 1024] = torch.ops.aten.t.default(primals_17); primals_17 = None
addmm_4: f32[s0, 1024], [1024, 1] = torch.ops.aten.addmm.default(primals_18, relu_3, t_4); primals_18 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
relu_4: f32[s0, 1024], [1024, 1] = torch.ops.aten.relu.default(addmm_4); addmm_4 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
t_5: f32[1024, 1], [1, 1024] = torch.ops.aten.t.default(primals_19); primals_19 = None
addmm_5: f32[s0, 1], [1, 1] = torch.ops.aten.addmm.default(primals_20, relu_4, t_5); primals_20 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
relu_5: f32[s0, 1], [1, 1] = torch.ops.aten.relu.default(addmm_5); addmm_5 = None
# No stacktrace found for following nodes
is_same_size = torch.ops.aten.is_same_size.default(relu_5, tangents_1)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
threshold_backward: f32[s0, 1], [1, 1] = torch.ops.aten.threshold_backward.default(tangents_1, relu_5, 0); tangents_1 = relu_5 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
t_6: f32[1, 1024], [1024, 1] = torch.ops.aten.t.default(t_5); t_5 = None
mm: f32[s0, 1024], [1024, 1] = torch.ops.aten.mm.default(threshold_backward, t_6); t_6 = None
t_7: f32[1, s0], [1, 1] = torch.ops.aten.t.default(threshold_backward)
mm_1: f32[1, 1024], [1024, 1] = torch.ops.aten.mm.default(t_7, relu_4); t_7 = None
t_8: f32[1024, 1], [1, 1024] = torch.ops.aten.t.default(mm_1); mm_1 = None
sum_1: f32[1, 1], [0, 1] = torch.ops.aten.sum.dim_IntList(threshold_backward, [0], True); threshold_backward = None
view_1: f32[1], [0] = torch.ops.aten.view.default(sum_1, [1]); sum_1 = None
t_9: f32[1, 1024], [1024, 1] = torch.ops.aten.t.default(t_8); t_8 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
threshold_backward_1: f32[s0, 1024], [1024, 1] = torch.ops.aten.threshold_backward.default(mm, relu_4, 0); mm = relu_4 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
t_10: f32[1024, 1024], [1024, 1] = torch.ops.aten.t.default(t_4); t_4 = None
mm_2: f32[s0, 1024], [1024, 1] = torch.ops.aten.mm.default(threshold_backward_1, t_10); t_10 = None
t_11: f32[1024, s0], [1, 1024] = torch.ops.aten.t.default(threshold_backward_1)
mm_3: f32[1024, 1024], [1024, 1] = torch.ops.aten.mm.default(t_11, relu_3); t_11 = None
t_12: f32[1024, 1024], [1, 1024] = torch.ops.aten.t.default(mm_3); mm_3 = None
sum_2: f32[1, 1024], [0, 1] = torch.ops.aten.sum.dim_IntList(threshold_backward_1, [0], True); threshold_backward_1 = None
view_2: f32[1024], [1] = torch.ops.aten.view.default(sum_2, [1024]); sum_2 = None
t_13: f32[1024, 1024], [1024, 1] = torch.ops.aten.t.default(t_12); t_12 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
threshold_backward_2: f32[s0, 1024], [1024, 1] = torch.ops.aten.threshold_backward.default(mm_2, relu_3, 0); mm_2 = relu_3 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
t_14: f32[1024, 1024], [1024, 1] = torch.ops.aten.t.default(t_3); t_3 = None
mm_4: f32[s0, 1024], [1024, 1] = torch.ops.aten.mm.default(threshold_backward_2, t_14); t_14 = None
t_15: f32[1024, s0], [1, 1024] = torch.ops.aten.t.default(threshold_backward_2)
mm_5: f32[1024, 1024], [1024, 1] = torch.ops.aten.mm.default(t_15, relu_2); t_15 = None
t_16: f32[1024, 1024], [1, 1024] = torch.ops.aten.t.default(mm_5); mm_5 = None
sum_3: f32[1, 1024], [0, 1] = torch.ops.aten.sum.dim_IntList(threshold_backward_2, [0], True); threshold_backward_2 = None
view_3: f32[1024], [1] = torch.ops.aten.view.default(sum_3, [1024]); sum_3 = None
t_17: f32[1024, 1024], [1024, 1] = torch.ops.aten.t.default(t_16); t_16 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
threshold_backward_3: f32[s0, 1024], [1024, 1] = torch.ops.aten.threshold_backward.default(mm_4, relu_2, 0); mm_4 = relu_2 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
t_18: f32[1024, 100], [100, 1] = torch.ops.aten.t.default(t_2); t_2 = None
mm_6: f32[s0, 100], [100, 1] = torch.ops.aten.mm.default(threshold_backward_3, t_18); t_18 = None
t_19: f32[1024, s0], [1, 1024] = torch.ops.aten.t.default(threshold_backward_3)
mm_7: f32[1024, 100], [100, 1] = torch.ops.aten.mm.default(t_19, cat_1); t_19 = cat_1 = None
t_20: f32[100, 1024], [1, 100] = torch.ops.aten.t.default(mm_7); mm_7 = None
sum_4: f32[1, 1024], [0, 1] = torch.ops.aten.sum.dim_IntList(threshold_backward_3, [0], True); threshold_backward_3 = None
view_4: f32[1024], [1] = torch.ops.aten.view.default(sum_4, [1024]); sum_4 = None
t_21: f32[1024, 100], [100, 1] = torch.ops.aten.t.default(t_20); t_20 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:323, code: R = torch.cat([x] + [Zflat], dim=1)
slice_2: f32[s0, 64], [100, 1] = torch.ops.aten.slice.Tensor(mm_6, 1, 0, 64)
slice_3: f32[s0, 36], [100, 1] = torch.ops.aten.slice.Tensor(mm_6, 1, 64, 100); mm_6 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:321, code: Zflat = Z[:, li, lj]
sym_size_1: Sym(s0) = torch.ops.aten.sym_size(slice_1, 0); slice_1 = None
sym_size_2: Sym(9) = torch.ops.aten.sym_size(view, 1)
new_zeros: f32[s0, 9, 9], [81, 9, 1] = torch.ops.aten.new_zeros.default(slice_3, [sym_size_1, sym_size_2, sym_size_2], dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0)); sym_size_1 = None
index_put: f32[s0, 9, 9], [81, 9, 1] = torch.ops.aten.index_put.default(new_zeros, [None, lift_fresh_copy, lift_fresh_copy_1], slice_3, True); new_zeros = lift_fresh_copy = lift_fresh_copy_1 = slice_3 = None
slice_backward: f32[s0, 9, 9], [81, 9, 1] = torch.ops.aten.slice_backward.default(index_put, [sym_size, sym_size_2, sym_size_2], 0, 0, 9223372036854775807, 1); index_put = sym_size_2 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:308, code: Z = torch.bmm(T, torch.transpose(T, 1, 2))
transpose_1: f32[s0, 64, 9], [576, 1, 64] = torch.ops.aten.transpose.int(view, 1, 2); view = None
bmm_1: f32[s0, 64, 9], [576, 9, 1] = torch.ops.aten.bmm.default(transpose_1, slice_backward); transpose_1 = None
transpose_2: f32[s0, 9, 64], [576, 64, 1] = torch.ops.aten.transpose.int(transpose, 1, 2); transpose = None
bmm_2: f32[s0, 9, 64], [576, 64, 1] = torch.ops.aten.bmm.default(slice_backward, transpose_2); slice_backward = transpose_2 = None
transpose_3: f32[s0, 9, 64], [576, 1, 9] = torch.ops.aten.transpose.int(bmm_1, 1, 2); bmm_1 = None
# Gradient addition node due to multiple use of tensor around:, File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:308, code: Z = torch.bmm(T, torch.transpose(T, 1, 2))
add: f32[s0, 9, 64], [576, 64, 1] = torch.ops.aten.add.Tensor(bmm_2, transpose_3); bmm_2 = transpose_3 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:306, code: T = torch.cat([x] + ly, dim=1).view((batch_size, -1, d))
_reshape_alias: f32[s0, 576], [576, 1] = torch.ops.aten._reshape_alias.default(add, [sym_size, 576], [576, 1]); add = sym_size = None
slice_4: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 0, 64)
slice_5: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 64, 128)
slice_6: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 128, 192)
slice_7: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 192, 256)
slice_8: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 256, 320)
slice_9: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 320, 384)
slice_10: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 384, 448)
slice_11: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 448, 512)
slice_12: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 512, 576); _reshape_alias = None
# Gradient addition node due to multiple use of tensor around:, File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:306, code: T = torch.cat([x] + ly, dim=1).view((batch_size, -1, d))
add_1: f32[s0, 64], [64, 1] = torch.ops.aten.add.Tensor(slice_2, slice_4); slice_2 = slice_4 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
index_select: f32[s3, 64], [64, 1] = torch.ops.aten.index_select.default(slice_12, 0, getitem_29)
sym_size_3: Sym(s3) = torch.ops.aten.sym_size(primals_30, 0)
# No stacktrace found for following nodes
floordiv: Sym(s3) = sym_size_3 // 1; sym_size_3 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
sym_stride: Sym(1) = torch.ops.aten.sym_stride(primals_30, 0)
# No stacktrace found for following nodes
mul: Sym(s3) = floordiv * sym_stride
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
_reshape_alias_1: i64[1, s3], [s3, 1] = torch.ops.aten._reshape_alias.default(primals_30, [1, floordiv], [mul, sym_stride]); primals_30 = floordiv = mul = sym_stride = None
sym_numel: Sym(s3) = torch.ops.aten.sym_numel(getitem_29); getitem_29 = None
sym_size_4: Sym(64) = torch.ops.aten.sym_size(slice_12, 1); slice_12 = None
# No stacktrace found for following nodes
mul_1: Sym(64*s3) = sym_numel * sym_size_4; sym_numel = None
floordiv_1: Sym(s3) = mul_1 // sym_size_4; mul_1 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
_reshape_alias_2: f32[s3, 64], [64, 1] = torch.ops.aten._reshape_alias.default(index_select, [floordiv_1, sym_size_4], [sym_size_4, 1]); index_select = floordiv_1 = None
_sparse_coo_tensor_with_dims_and_tensors: f32[1000000, 64], [0, 0] = torch.ops.aten._sparse_coo_tensor_with_dims_and_tensors.default(1, 1, [1000000, sym_size_4], _reshape_alias_1, _reshape_alias_2, dtype = torch.float32, layout = torch.sparse_coo, device = device(type='cuda', index=0), pin_memory = None); sym_size_4 = _reshape_alias_1 = _reshape_alias_2 = None
ERROR:common:Cannot access storage of SparseTensorImpl
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py", line 336, in forward
def forward(self, dense_x, lS_o, lS_i):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
return compiled_f(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 958, in new_func
compiled_fn = create_aot_dispatcher_function(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 678, in create_aot_dispatcher_function
aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 495, in aot_dispatch_autograd
fx_g = make_fx(
File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 665, in wrapped
t = dispatch_trace(wrap_key(func, args, fx_tracer), tracer=fx_tracer, concrete_args=tuple(phs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 422, in dispatch_trace
graph = tracer.trace(root, concrete_args)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/fx/_symbolic_trace.py", line 739, in trace
(self.create_arg(fn(*args)),),
File "/scratch/ezyang/work/pytorch/torch/fx/_symbolic_trace.py", line 614, in flatten_fn
tree_out = root_fn(*tree_args)
File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 439, in wrapped
out = f(*tensors)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 189, in inner
outs = f(*f_args, **f_kwargs)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 257, in joint_forward_backward
backward_out = torch.autograd.grad(
File "/scratch/ezyang/work/pytorch/torch/autograd/__init__.py", line 300, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
NotImplementedError: Cannot access storage of SparseTensorImpl
TorchDynamo optimized model failed to run because of following error
cuda train dlrm FAIL
/scratch/ezyang/work/pytorch/torch/utils/tensorboard/__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if not hasattr(tensorboard, "__version__") or LooseVersion(
/scratch/ezyang/work/env/lib/python3.9/site-packages/gym/core.py:317: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
deprecation(
Running torchbench.py drq...
cuda train drq PASS
Running torchbench.py fastNLP_Bert...
cuda train fastNLP_Bert PASS
Running torchbench.py functorch_dp_cifar10...
cuda train functorch_dp_cifar10 PASS
Running torchbench.py functorch_maml_omniglot...
[2022-11-12 16:57:17,407] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 16:57:18,455] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train functorch_maml_omniglot PASS
Running torchbench.py hf_Albert...
cuda train hf_Albert PASS
Running torchbench.py hf_Bart...
cuda train hf_Bart PASS
Running torchbench.py hf_Bert...
cuda train hf_Bert PASS
Running torchbench.py hf_BigBird...
ERROR:common:output 0: torch.Size([2, 1, 12, 64, 192]) != torch.Size([2, 12, 64, 192])
While executing %mul : [#users=2] = call_function[target=torch.ops.aten.mul.Tensor](args = (%permute, %permute_1), kwargs = {})
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2462, in forward
outputs = self.bert(
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2104, in forward
blocked_encoder_mask, band_mask, from_mask, to_mask = self.create_masks_for_block_sparse_attn(
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2185, in create_masks_for_block_sparse_attn
@staticmethod
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
return compiled_f(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 963, in new_func
return compiled_fn(args)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 390, in new_fn
fw_outs = call_func_with_args(compiled_fw, args, disable_amp=disable_amp)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 346, in call_func_with_args
out = normalize_as_list(f(args))
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 321, in g
return f(*args)
File "/scratch/ezyang/work/pytorch/torch/fx/interpreter.py", line 130, in run
self.env[node] = self.run_node(node)
File "/scratch/ezyang/work/pytorch/functorch/_src/compilers.py", line 162, in run_node
assert nv.size() == rv.size(), f"output {i}: {nv.size()} != {rv.size()}"
AssertionError: output 0: torch.Size([2, 1, 12, 64, 192]) != torch.Size([2, 12, 64, 192])
While executing %mul : [#users=2] = call_function[target=torch.ops.aten.mul.Tensor](args = (%permute, %permute_1), kwargs = {})
TorchDynamo optimized model failed to run because of following error
cuda train hf_BigBird FAIL
Running torchbench.py hf_DistilBert...
cuda train hf_DistilBert PASS
Running torchbench.py hf_GPT2...
cuda train hf_GPT2 PASS
Running torchbench.py hf_GPT2_large...
cuda train hf_GPT2_large PASS
Running torchbench.py hf_Longformer...
[2022-11-12 17:01:53,287] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:01:54,658] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:03:12,259] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64)
function: '_chunk' (/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/longformer/modeling_longformer.py:770)
reasons: ['(Eq(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0], 512*hidden_states.size()[0]*hidden_states.stride()[0]*hidden_states.size()[1]//512) and\nEq(Mod(hidden_states.size()[1], hidden_states.size()[1]//512), 0) and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= 0 and\nNe(hidden_states.size()[1]//512, 0) and\n4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0] <= 4*hidden_states.size()[1]*2*768 and\nNe(hidden_states.size()[1]/hidden_states.size()[1]//512, 1) and\nNe(hidden_states.size()[1]//512, 1) and\nNe(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512, 1) and\nhidden_states.size()[1]//512 >= 2 and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= hidden_states.stride()[0] and\nhidden_states.size()[1]/hidden_states.size()[1]//512 >= 2 and\nhidden_states.size()[0]*hidden_states.stride()[0] < hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= 0 and\nhidden_states.size()[1]//512 > 1 and\nEq(hidden_states.size()[1]/hidden_states.size()[1]//512, 512) and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= 0 and\nNe(4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 + 8*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - 8*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 0) and\n4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 + 8*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - 8*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 <= 4*hidden_states.size()[1]*2*768 and\nNe(2*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0] - hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512, 0) and\nNe(2*hidden_states.size()[1]//512 - 1, 1) and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 1) and\n2*hidden_states.size()[1]//512 - 1 >= 2 and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= hidden_states.stride()[0] and\nhidden_states.size()[0]*hidden_states.stride()[0] < (hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512) and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 0) and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= 0 and\n2*hidden_states.size()[1]//512 - 1 > 1 and\n1 < 2*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - (hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 and \nhidden_states.stride()[0] == hidden_states.size()[2])']
to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md.
cuda train hf_Longformer PASS
Running torchbench.py hf_Reformer...
cuda train hf_Reformer PASS
Running torchbench.py hf_T5...
WARNING:common:fp64 golden ref were not generated for hf_T5
cuda train hf_T5 PASS
Running torchbench.py hf_T5_base...
WARNING:common:fp64 golden ref were not generated for hf_T5_base
cuda train hf_T5_base PASS
Running torchbench.py hf_T5_large...
cuda train hf_T5_large PASS
Running torchbench.py lennard_jones...
cuda train lennard_jones PASS
Running torchbench.py maml_omniglot...
[2022-11-12 17:07:20,155] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:07:21,153] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train maml_omniglot PASS
Running torchbench.py mnasnet1_0...
[2022-11-12 17:07:32,983] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
Exception ignored in: <generator object preorder_traversal._preorder_traversal at 0x7f0821976350>
Traceback (most recent call last):
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/sympy/core/traversal.py", line 121, in _preorder_traversal
yield node
SystemError: error return without exception set
cuda train mnasnet1_0 PASS
Running torchbench.py mobilenet_v2...
[2022-11-12 17:07:46,227] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train mobilenet_v2 PASS
Running torchbench.py mobilenet_v2_quantized_qat...
WARNING:common:fp64 golden ref were not generated for mobilenet_v2_quantized_qat
ERROR:common:'Graph' object has no attribute 'print_readable'
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 660, in call_wrapped
return self._wrapped_call(self, *args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 279, in __call__
raise e
File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 269, in __call__
return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "<eval_with_key>.8", line 4, in forward
def forward(self, x : torch.Tensor) -> torch.Tensor:
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
return compiled_f(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 958, in new_func
compiled_fn = create_aot_dispatcher_function(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 678, in create_aot_dispatcher_function
aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 499, in aot_dispatch_autograd
assert_functional_graph(fx_g.graph)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 401, in assert_functional_graph
fx_g.print_readable()
AttributeError: 'Graph' object has no attribute 'print_readable'
====== Buggy post-functionalization graph ======
TorchDynamo optimized model failed to run because of following error
cuda train mobilenet_v2_quantized_qat FAIL
Running torchbench.py mobilenet_v3_large...
[2022-11-12 17:08:36,739] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train mobilenet_v3_large PASS
Running torchbench.py moco...
[2022-11-12 17:08:49,627] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
ERROR:common:
from user code:
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 172, in concat_all_gather
torch.distributed.all_gather(tensors_gather, tensor, async_op=False)
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1076, in run_node
return node.target(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/distributed/distributed_c10d.py", line 1350, in wrapper
return func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/distributed/distributed_c10d.py", line 2345, in all_gather
work = default_pg.allgather([tensor_list], [tensor])
File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 874, in __torch_dispatch__
raise RuntimeError(
RuntimeError: c10d.allgather_.default - couldn't find symbolic meta function/decomposition
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
return wrap_fake_exception(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
return fn()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
lambda: run_node(tx.output, node, args, kwargs, nnmodule)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
raise RuntimeError(
RuntimeError: Failed running call_function <function all_gather at 0x7f09d22310d0>(*([FakeTensor(FakeTensor(..., device='meta', size=(s0, s1, s2, s2)), cuda:0)], FakeTensor(FakeTensor(..., device='meta', size=(s0, s1, s2, s2)), cuda:0)), **{'async_op': False}):
c10d.allgather_.default - couldn't find symbolic meta function/decomposition
(scroll up for backtrace)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/parallel/distributed.py", line 1096, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/parallel/distributed.py", line 1049, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 130, in forward
self._momentum_update_key_encoder() # update the key encoder
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 133, in <graph break in forward>
im_k, idx_unshuffle = self._batch_shuffle_ddp(im_k)
File "/scratch/ezyang/work/pytorch/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 76, in _batch_shuffle_ddp
x_gather = concat_all_gather(x)
File "/scratch/ezyang/work/pytorch/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
return callback(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
result = inner_convert(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
r = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
return _compile(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
out_code = transform_code_object(code, transform)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
super().run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 835, in CALL_FUNCTION_KW
self.call_function(fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/torch.py", line 403, in call_function
tensor_variable = wrap_fx_proxy(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
return wrap_fx_proxy_cls(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
example_value = get_fake_value(proxy.node, tx)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:
from user code:
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 172, in concat_all_gather
torch.distributed.all_gather(tensors_gather, tensor, async_op=False)
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
TorchDynamo optimized model failed to run because of following error
cuda train moco FAIL
Running torchbench.py nvidia_deeprecommender...
cuda train nvidia_deeprecommender PASS
Running torchbench.py pytorch_CycleGAN_and_pix2pix...
--dataroot /scratch/ezyang/work/torchbenchmark/torchbenchmark/data/.data/pytorch_CycleGAN_and_pix2pix_inputs/datasets/horse2zebra --name horse2zebra --model cycle_gan --display_id 0 --n_epochs 3 --n_epochs_decay 3 --gpu_ids 0 --checkpoints_dir /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/pytorch_CycleGAN_and_pix2pix/.data/checkpoints
cuda train pytorch_CycleGAN_and_pix2pix PASS
Running torchbench.py pytorch_stargan...
ERROR:common:'Graph' object has no attribute 'print_readable'
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/pytorch_stargan/model.py", line 55, in forward
def forward(self, x, c):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
return compiled_f(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 958, in new_func
compiled_fn = create_aot_dispatcher_function(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 678, in create_aot_dispatcher_function
aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 499, in aot_dispatch_autograd
assert_functional_graph(fx_g.graph)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 401, in assert_functional_graph
fx_g.print_readable()
AttributeError: 'Graph' object has no attribute 'print_readable'
====== Buggy post-functionalization graph ======
TorchDynamo optimized model failed to run because of following error
cuda train pytorch_stargan FAIL
Running torchbench.py pytorch_struct...
cuda train pytorch_struct PASS
Running torchbench.py pytorch_unet...
[2022-11-12 17:10:07,552] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:10:13,801] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train pytorch_unet PASS
Running torchbench.py resnet18...
[2022-11-12 17:10:23,982] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train resnet18 PASS
Running torchbench.py resnet50...
[2022-11-12 17:10:35,709] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train resnet50 PASS
Running torchbench.py resnet50_quantized_qat...
WARNING:common:fp64 golden ref were not generated for resnet50_quantized_qat
ERROR:common:'Graph' object has no attribute 'print_readable'
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 660, in call_wrapped
return self._wrapped_call(self, *args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 279, in __call__
raise e
File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 269, in __call__
return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "<eval_with_key>.8", line 4, in forward
def forward(self, x : torch.Tensor) -> torch.Tensor:
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
return compiled_f(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 958, in new_func
compiled_fn = create_aot_dispatcher_function(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 678, in create_aot_dispatcher_function
aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 499, in aot_dispatch_autograd
assert_functional_graph(fx_g.graph)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 401, in assert_functional_graph
fx_g.print_readable()
AttributeError: 'Graph' object has no attribute 'print_readable'
====== Buggy post-functionalization graph ======
TorchDynamo optimized model failed to run because of following error
cuda train resnet50_quantized_qat FAIL
Running torchbench.py resnext50_32x4d...
[2022-11-12 17:11:20,351] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train resnext50_32x4d PASS
Running torchbench.py shufflenet_v2_x1_0...
[2022-11-12 17:11:33,004] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train shufflenet_v2_x1_0 PASS
/scratch/ezyang/work/env/lib/python3.9/site-packages/gym/core.py:317: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
deprecation(
/scratch/ezyang/work/env/lib/python3.9/site-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
deprecation(
Running torchbench.py soft_actor_critic...
cuda train soft_actor_critic PASS
[2022-11-12 17:11:46,278] torch._dynamo.testing: [WARNING] High loss value alert - 125.05. Can result in unstable gradients.
Running torchbench.py speech_transformer...
[2022-11-12 17:11:46,409] torch._dynamo.testing: [WARNING] High loss value alert - 125.05. Can result in unstable gradients.
[2022-11-12 17:11:46,547] torch._dynamo.testing: [WARNING] High loss value alert - 125.05. Can result in unstable gradients.
[2022-11-12 17:11:46,651] torch._dynamo.testing: [WARNING] High loss value alert - 125.05. Can result in unstable gradients.
[2022-11-12 17:11:55,245] torch._dynamo.variables.builtin: [WARNING] incorrect arg count <bound method BuiltinVariable._call_min_max of BuiltinVariable(max)> missing a required argument: 'b' and no constant handler
[2022-11-12 17:12:25,935] torch._dynamo.testing: [WARNING] High loss value alert - 125.05. Can result in unstable gradients.
cuda train speech_transformer PASS
[2022-11-12 17:12:33,559] torch._dynamo.testing: [WARNING] High loss value alert - 3.89. Can result in unstable gradients.
Running torchbench.py squeezenet1_1...
[2022-11-12 17:12:33,597] torch._dynamo.testing: [WARNING] High loss value alert - 3.89. Can result in unstable gradients.
[2022-11-12 17:12:33,625] torch._dynamo.testing: [WARNING] High loss value alert - 3.89. Can result in unstable gradients.
[2022-11-12 17:12:33,648] torch._dynamo.testing: [WARNING] High loss value alert - 3.89. Can result in unstable gradients.
[2022-11-12 17:12:41,718] torch._dynamo.testing: [WARNING] High loss value alert - 3.89. Can result in unstable gradients.
cuda train squeezenet1_1 PASS
[2022-11-12 17:12:52,452] torch._dynamo.testing: [WARNING] High loss value alert - 86.50. Can result in unstable gradients.
Running torchbench.py tacotron2...
[2022-11-12 17:12:55,845] torch._dynamo.testing: [WARNING] High loss value alert - 86.50. Can result in unstable gradients.
[2022-11-12 17:12:59,278] torch._dynamo.testing: [WARNING] High loss value alert - 86.50. Can result in unstable gradients.
[2022-11-12 17:13:02,749] torch._dynamo.testing: [WARNING] High loss value alert - 86.50. Can result in unstable gradients.
[2022-11-12 17:13:03,565] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:03,786] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,676] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,718] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,759] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,800] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,839] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,877] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,963] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
ERROR:common:aten._thnn_fused_lstm_cell_backward_impl.default - couldn't find symbolic meta function/decomposition
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 505, in forward
encoder_outputs = self.encoder(embedded_inputs, text_lengths)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 507, in <graph break in forward>
mel_outputs, gate_outputs, alignments = self.decoder(
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 396, in forward
decoder_input = self.get_go_frame(memory).unsqueeze(0)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 402, in <graph break in forward>
memory, mask=~get_mask_from_lengths(memory_lengths))
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 401, in <graph break in forward>
self.initialize_decoder_states(
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 407, in <graph break in forward>
mel_output, gate_output, attention_weights = self.decode(
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 340, in decode
def decode(self, decoder_input):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
return compiled_f(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 958, in new_func
compiled_fn = create_aot_dispatcher_function(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 678, in create_aot_dispatcher_function
aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 495, in aot_dispatch_autograd
fx_g = make_fx(
File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 665, in wrapped
t = dispatch_trace(wrap_key(func, args, fx_tracer), tracer=fx_tracer, concrete_args=tuple(phs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 422, in dispatch_trace
graph = tracer.trace(root, concrete_args)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/fx/_symbolic_trace.py", line 739, in trace
(self.create_arg(fn(*args)),),
File "/scratch/ezyang/work/pytorch/torch/fx/_symbolic_trace.py", line 614, in flatten_fn
tree_out = root_fn(*tree_args)
File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 439, in wrapped
out = f(*tensors)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 189, in inner
outs = f(*f_args, **f_kwargs)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 257, in joint_forward_backward
backward_out = torch.autograd.grad(
File "/scratch/ezyang/work/pytorch/torch/autograd/__init__.py", line 300, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 465, in __torch_dispatch__
return self.inner_torch_dispatch(func, types, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 490, in inner_torch_dispatch
out = proxy_call(self, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 323, in proxy_call
out = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_ops.py", line 284, in __call__
return self._op(*args, **kwargs or {})
File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 874, in __torch_dispatch__
raise RuntimeError(
RuntimeError: aten._thnn_fused_lstm_cell_backward_impl.default - couldn't find symbolic meta function/decomposition
incomplete graph:
class joint_forward_backward(torch.nn.Module):
def forward(self, primals, tangents):
primals_1: f32[4096, 768], [768, 1], primals_2: f32[4096, 1024], [1024, 1], primals_3: f32[4096], [1], primals_4: f32[4096], [1], primals_5: f32[s1, s2], [s2, 1], primals_6: f32[s1, 768 - s2], [768 - s2, 1], primals_7: f32[s1, 1024], [1024, 1], primals_8: f32[s1, 1024], [1024, 1], primals_9: f32[s1, s5], [s5, 1], primals_10: f32[s1, s5], [s5, 1], tangents_1: f32[s1, 1024], [1024, 1], tangents_2: f32[s1, 2, s5], [2*s5, s5, 1], tangents_3: f32[s1, 1024], [1024, 1], = fx_pytree.tree_flatten_spec([primals, tangents], self._in_spec)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:352, code: cell_input = torch.cat((decoder_input, self.attention_context), -1)
cat: f32[s1, 768], [768, 1] = torch.ops.aten.cat.default([primals_5, primals_6], -1); primals_5 = primals_6 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:353, code: self.attention_hidden, self.attention_cell = self.attention_rnn(
t: f32[768, 4096], [1, 768] = torch.ops.aten.t.default(primals_1); primals_1 = None
mm: f32[s1, 4096], [4096, 1] = torch.ops.aten.mm.default(cat, t); cat = t = None
t_1: f32[1024, 4096], [1, 1024] = torch.ops.aten.t.default(primals_2); primals_2 = None
mm_1: f32[s1, 4096], [4096, 1] = torch.ops.aten.mm.default(primals_7, t_1); primals_7 = t_1 = None
_thnn_fused_lstm_cell = torch.ops.aten._thnn_fused_lstm_cell.default(mm, mm_1, primals_8, primals_3, primals_4); mm = mm_1 = primals_3 = primals_4 = None
getitem: f32[s1, 1024], [1024, 1] = _thnn_fused_lstm_cell[0]
getitem_1: f32[s1, 1024], [1024, 1] = _thnn_fused_lstm_cell[1]
getitem_2: f32[s1, 4096], [4096, 1] = _thnn_fused_lstm_cell[2]; _thnn_fused_lstm_cell = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:355, code: self.attention_hidden = F.dropout(
native_dropout = torch.ops.aten.native_dropout.default(getitem, 0.1, True); getitem = None
getitem_3: f32[s1, 1024], [1024, 1] = native_dropout[0]
getitem_4: b8[s1, 1024], [1024, 1] = native_dropout[1]; native_dropout = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:359, code: (self.attention_weights.unsqueeze(1),
unsqueeze: f32[s1, 1, s5], [s5, 0, 1] = torch.ops.aten.unsqueeze.default(primals_9, 1); primals_9 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:360, code: self.attention_weights_cum.unsqueeze(1)), dim=1)
unsqueeze_1: f32[s1, 1, s5], [s5, 0, 1] = torch.ops.aten.unsqueeze.default(primals_10, 1); primals_10 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:358, code: attention_weights_cat = torch.cat(
cat_1: f32[s1, 2, s5], [2*s5, s5, 1] = torch.ops.aten.cat.default([unsqueeze, unsqueeze_1], 1); unsqueeze = unsqueeze_1 = None
# Gradient addition node due to multiple use of tensor around:, File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:503, code: embedded_inputs = self.embedding(text_inputs).transpose(1, 2)
is_same_size = torch.ops.aten.is_same_size.default(getitem_3, tangents_1); getitem_3 = None
is_same_size_1 = torch.ops.aten.is_same_size.default(getitem_1, tangents_3)
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:355, code: self.attention_hidden = F.dropout(
native_dropout_backward: f32[s1, 1024], [1024, 1] = torch.ops.aten.native_dropout_backward.default(tangents_1, getitem_4, 1.1111111111111112); tangents_1 = getitem_4 = None
# File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:353, code: self.attention_hidden, self.attention_cell = self.attention_rnn(
_thnn_fused_lstm_cell_backward_impl = torch.ops.aten._thnn_fused_lstm_cell_backward_impl.default(native_dropout_backward, tangents_3, primals_8, getitem_1, getitem_2, True); native_dropout_backward = tangents_3 = primals_8 = getitem_1 = getitem_2 = None
TorchDynamo optimized model failed to run because of following error
cuda train tacotron2 FAIL
[2022-11-12 17:13:15,045] torch._dynamo.testing: [WARNING] High loss value alert - 55.32. Can result in unstable gradients.
Running torchbench.py timm_efficientdet...
[2022-11-12 17:13:17,673] torch._dynamo.testing: [WARNING] High loss value alert - 55.43. Can result in unstable gradients.
[2022-11-12 17:13:19,613] torch._dynamo.testing: [WARNING] High loss value alert - 55.32. Can result in unstable gradients.
[2022-11-12 17:13:19,928] torch._dynamo.testing: [WARNING] High loss value alert - 55.32. Can result in unstable gradients.
ERROR:common:
from user code:
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/effdet/efficientdet.py", line 211, in forward
input_node = resample(input_node)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/effdet/efficientdet.py", line 134, in forward
return F.interpolate(
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1076, in run_node
return node.target(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/functional.py", line 3928, in interpolate
return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
return wrap_fake_exception(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
return fn()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
lambda: run_node(tx.output, node, args, kwargs, nnmodule)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
raise RuntimeError(
RuntimeError: Failed running call_function <function interpolate at 0x7fbd78ae9040>(*(FakeTensor(FakeTensor(..., device='meta',
size=(s0, 88, ceiling(ceiling(ceiling(ceiling(ceiling(ceiling(ceiling(s2/2)/2)/2)/2)/2)/2)/2), ceiling(ceiling(ceiling(ceiling(ceiling(ceiling(ceiling(s2/2)/2)/2)/2)/2)/2)/2)),
grad_fn=<MaxPool2DWithIndicesBackward0>), cuda:0), (10, 10), None, 'nearest', None), **{'recompute_scale_factor': False}):
Cannot call sizes() on tensor with symbolic sizes/strides
(scroll up for backtrace)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
return callback(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
result = inner_convert(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
r = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
return _compile(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
out_code = transform_code_object(code, transform)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
super().run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 183, in call_function
tx.call_function(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 835, in CALL_FUNCTION_KW
self.call_function(fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/torch.py", line 403, in call_function
tensor_variable = wrap_fx_proxy(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
return wrap_fx_proxy_cls(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
example_value = get_fake_value(proxy.node, tx)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:
from user code:
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/effdet/efficientdet.py", line 211, in forward
input_node = resample(input_node)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/effdet/efficientdet.py", line 134, in forward
return F.interpolate(
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
TorchDynamo optimized model failed to run because of following error
cuda train timm_efficientdet FAIL
Running torchbench.py timm_efficientnet...
cuda train timm_efficientnet PASS
Running torchbench.py timm_regnet...
cuda train timm_regnet PASS
Running torchbench.py timm_resnest...
[2022-11-12 17:15:15,796] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train timm_resnest PASS
Running torchbench.py timm_vision_transformer...
cuda train timm_vision_transformer PASS
Running torchbench.py timm_vision_transformer_large...
cuda train timm_vision_transformer_large PASS
Running torchbench.py timm_vovnet...
cuda train timm_vovnet PASS
Running torchbench.py tts_angular...
ERROR:common:
from user code:
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tts_angular/model.py", line 18, in <graph break in forward>
o, (_, _) = self.lstm(x)
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1081, in run_node
return nnmodule(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/rnn.py", line 776, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/scratch/ezyang/work/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
return wrap_fake_exception(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
return fn()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
lambda: run_node(tx.output, node, args, kwargs, nnmodule)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
raise RuntimeError(
RuntimeError: Failed running call_module self_lstm(*(FakeTensor(FakeTensor(..., device='meta', size=(s0, s1, 40)), cuda:0),), **{}):
!has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/scratch/ezyang/work/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch.
(scroll up for backtrace)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tts_angular/model.py", line 59, in forward
d = self.layers(x)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tts_angular/model.py", line 17, in forward
self.lstm.flatten_parameters()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
return callback(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
result = inner_convert(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
r = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
return _compile(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
out_code = transform_code_object(code, transform)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
super().run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 202, in call_function
return wrap_fx_proxy(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
return wrap_fx_proxy_cls(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
example_value = get_fake_value(proxy.node, tx)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:
from user code:
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tts_angular/model.py", line 18, in <graph break in forward>
o, (_, _) = self.lstm(x)
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
TorchDynamo optimized model failed to run because of following error
cuda train tts_angular FAIL
Running torchbench.py vgg16...
cuda train vgg16 PASS
[2022-11-12 17:17:14,994] torch._dynamo.testing: [WARNING] High loss value alert - 66.78. Can result in unstable gradients.
Running torchbench.py vision_maskrcnn...
[2022-11-12 17:17:18,626] torch._dynamo.testing: [WARNING] High loss value alert - 66.78. Can result in unstable gradients.
[2022-11-12 17:17:18,941] torch._dynamo.testing: [WARNING] High loss value alert - 66.78. Can result in unstable gradients.
[2022-11-12 17:17:19,234] torch._dynamo.testing: [WARNING] High loss value alert - 66.78. Can result in unstable gradients.
ERROR:common:compile_fn raised AssertionError: While executing %mul : [#users=1] = call_method[target=__mul__](args = (%ceil, 32.0), kwargs = {})
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 449, in call_user_compiler
compiled_fn = self.compiler_fn(gm, self.example_inputs())
File "/scratch/ezyang/work/pytorch/torch/_dynamo/debug_utils.py", line 803, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 94, in compile_fn
return cls(gm, example_inputs).verified_candidate()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 116, in __init__
if not is_aot_autograd_safe_to_run(gm, example_inputs):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 64, in is_aot_autograd_safe_to_run
mutated = has_mutation(gm, example_inputs, inputs_only=True)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 160, in has_mutation
ShapeAliasingAndMutationProp(new_gm).run(*example_inputs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 124, in run
super().run(*args)
File "/scratch/ezyang/work/pytorch/torch/fx/interpreter.py", line 130, in run
self.env[node] = self.run_node(node)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 59, in run_node
assert result is not NotImplemented
AssertionError: While executing %mul : [#users=1] = call_method[target=__mul__](args = (%ceil, 32.0), kwargs = {})
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/torchvision/torchvision/models/detection/generalized_rcnn.py", line 83, in forward
images, targets = self.transform(images, targets)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/torchvision/torchvision/models/detection/transform.py", line 130, in forward
image, target_index = self.resize(image, target_index)
File "/scratch/ezyang/work/torchvision/torchvision/models/detection/transform.py", line 130, in <graph break in forward>
image, target_index = self.resize(image, target_index)
File "/scratch/ezyang/work/torchvision/torchvision/models/detection/transform.py", line 136, in <graph break in forward>
images = self.batch_images(images, size_divisible=self.size_divisible)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
return callback(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
result = inner_convert(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
r = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
return _compile(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
out_code = transform_code_object(code, transform)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
super().run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 348, in step
self.output.compile_subgraph(self, partial_convert=True)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 347, in compile_subgraph
self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 417, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 454, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised AssertionError: While executing %mul : [#users=1] = call_method[target=__mul__](args = (%ceil, 32.0), kwargs = {})
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
TorchDynamo optimized model failed to run because of following error
cuda train vision_maskrcnn FAIL
Running torchbench.py yolov3...
[2022-11-12 17:17:44,829] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
ERROR:common:
from user code:
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/yolov3/yolo_models.py", line 292, in <graph break in forward_once>
x = module(x)
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1081, in run_node
return nnmodule(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/upsampling.py", line 156, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,
File "/scratch/ezyang/work/pytorch/torch/nn/functional.py", line 3928, in interpolate
return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
return wrap_fake_exception(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
return fn()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
lambda: run_node(tx.output, node, args, kwargs, nnmodule)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
raise RuntimeError(
RuntimeError: Failed running call_module sub___tuple_iterator_getitem____stack0__2__1(*(FakeTensor(FakeTensor(..., device='meta', size=(s0, 256, s2, s3), grad_fn=<LeakyReluBackward1>), cuda:0),), **{}):
Cannot call sizes() on tensor with symbolic sizes/strides
(scroll up for backtrace)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/yolov3/yolo_models.py", line 238, in forward
return self.forward_once(x)
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/yolov3/yolo_models.py", line 290, in forward_once
yolo_out.append(module(x, out))
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/yolov3/yolo_models.py", line 290, in <graph break in forward_once>
yolo_out.append(module(x, out))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
return callback(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
result = inner_convert(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
r = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
return _compile(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
out_code = transform_code_object(code, transform)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
super().run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 202, in call_function
return wrap_fx_proxy(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
return wrap_fx_proxy_cls(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
example_value = get_fake_value(proxy.node, tx)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:
from user code:
File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/yolov3/yolo_models.py", line 292, in <graph break in forward_once>
x = module(x)
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
TorchDynamo optimized model failed to run because of following error
cuda train yolov3 FAIL
[2022-11-12 17:18:00,878] torch._dynamo.testing: [WARNING] High loss value alert - 10.33. Can result in unstable gradients.
Running huggingface.py AlbertForMaskedLM...
[2022-11-12 17:18:01,927] torch._dynamo.testing: [WARNING] High loss value alert - 10.33. Can result in unstable gradients.
[2022-11-12 17:18:02,176] torch._dynamo.testing: [WARNING] High loss value alert - 10.33. Can result in unstable gradients.
[2022-11-12 17:18:02,423] torch._dynamo.testing: [WARNING] High loss value alert - 10.33. Can result in unstable gradients.
[2022-11-12 17:18:28,749] torch._dynamo.testing: [WARNING] High loss value alert - 10.33. Can result in unstable gradients.
cuda train AlbertForMaskedLM PASS
[2022-11-12 17:18:39,843] torch._dynamo.testing: [WARNING] High loss value alert - 6.24. Can result in unstable gradients.
Running huggingface.py AlbertForQuestionAnswering...
[2022-11-12 17:18:40,892] torch._dynamo.testing: [WARNING] High loss value alert - 6.24. Can result in unstable gradients.
[2022-11-12 17:18:41,140] torch._dynamo.testing: [WARNING] High loss value alert - 6.24. Can result in unstable gradients.
[2022-11-12 17:18:41,386] torch._dynamo.testing: [WARNING] High loss value alert - 6.24. Can result in unstable gradients.
[2022-11-12 17:19:08,813] torch._dynamo.testing: [WARNING] High loss value alert - 6.24. Can result in unstable gradients.
cuda train AlbertForQuestionAnswering PASS
[2022-11-12 17:19:19,168] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
Running huggingface.py AllenaiLongformerBase...
[2022-11-12 17:19:19,506] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:19:19,692] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:19:19,866] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:19:25,056] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:19:26,094] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:20:32,712] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64)
function: '_chunk' (/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/longformer/modeling_longformer.py:770)
reasons: ['(Eq(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0], 512*hidden_states.size()[0]*hidden_states.stride()[0]*hidden_states.size()[1]//512) and\nEq(Mod(hidden_states.size()[1], hidden_states.size()[1]//512), 0) and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= 0 and\nNe(hidden_states.size()[1]//512, 0) and\n4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0] <= 4*hidden_states.size()[1]*768 and\nNe(hidden_states.size()[1]/hidden_states.size()[1]//512, 1) and\nNe(hidden_states.size()[1]//512, 1) and\nNe(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512, 1) and\nhidden_states.size()[1]//512 >= 2 and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= hidden_states.stride()[0] and\nhidden_states.size()[1]/hidden_states.size()[1]//512 >= 2 and\nhidden_states.size()[0]*hidden_states.stride()[0] < hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= 0 and\nhidden_states.size()[1]//512 > 1 and\nEq(hidden_states.size()[1]/hidden_states.size()[1]//512, 512) and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= 0 and\nNe(4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 + 8*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - 8*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 0) and\n4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 + 8*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - 8*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 <= 4*hidden_states.size()[1]*768 and\nNe(2*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0] - hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512, 0) and\nNe(2*hidden_states.size()[1]//512 - 1, 1) and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 1) and\n2*hidden_states.size()[1]//512 - 1 >= 2 and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= hidden_states.stride()[0] and\nhidden_states.size()[0]*hidden_states.stride()[0] < (hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512) and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 0) and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= 0 and\n2*hidden_states.size()[1]//512 - 1 > 1 and\n1 < 2*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - (hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 and \nhidden_states.stride()[0] == hidden_states.size()[2])']
to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md.
[2022-11-12 17:20:43,723] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
cuda train AllenaiLongformerBase PASS
[2022-11-12 17:20:57,237] torch._dynamo.testing: [WARNING] High loss value alert - 11.05. Can result in unstable gradients.
Running huggingface.py BartForCausalLM...
[2022-11-12 17:20:57,522] torch._dynamo.testing: [WARNING] High loss value alert - 11.05. Can result in unstable gradients.
[2022-11-12 17:20:57,638] torch._dynamo.testing: [WARNING] High loss value alert - 11.05. Can result in unstable gradients.
[2022-11-12 17:20:57,736] torch._dynamo.testing: [WARNING] High loss value alert - 11.05. Can result in unstable gradients.
[2022-11-12 17:21:23,498] torch._dynamo.testing: [WARNING] High loss value alert - 11.05. Can result in unstable gradients.
cuda train BartForCausalLM PASS
[2022-11-12 17:21:42,870] torch._dynamo.testing: [WARNING] High loss value alert - 11.01. Can result in unstable gradients.
Running huggingface.py BartForConditionalGeneration...
[2022-11-12 17:21:43,428] torch._dynamo.testing: [WARNING] High loss value alert - 11.01. Can result in unstable gradients.
[2022-11-12 17:21:43,657] torch._dynamo.testing: [WARNING] High loss value alert - 11.01. Can result in unstable gradients.
[2022-11-12 17:21:43,864] torch._dynamo.testing: [WARNING] High loss value alert - 11.01. Can result in unstable gradients.
[2022-11-12 17:22:45,976] torch._dynamo.testing: [WARNING] High loss value alert - 11.01. Can result in unstable gradients.
cuda train BartForConditionalGeneration PASS
[2022-11-12 17:22:56,373] torch._dynamo.testing: [WARNING] High loss value alert - 10.40. Can result in unstable gradients.
Running huggingface.py BertForMaskedLM...
[2022-11-12 17:22:56,456] torch._dynamo.testing: [WARNING] High loss value alert - 10.40. Can result in unstable gradients.
[2022-11-12 17:22:56,541] torch._dynamo.testing: [WARNING] High loss value alert - 10.40. Can result in unstable gradients.
[2022-11-12 17:22:56,618] torch._dynamo.testing: [WARNING] High loss value alert - 10.40. Can result in unstable gradients.
[2022-11-12 17:23:21,978] torch._dynamo.testing: [WARNING] High loss value alert - 10.40. Can result in unstable gradients.
cuda train BertForMaskedLM PASS
[2022-11-12 17:23:30,808] torch._dynamo.testing: [WARNING] High loss value alert - 4.95. Can result in unstable gradients.
Running huggingface.py BertForQuestionAnswering...
[2022-11-12 17:23:30,886] torch._dynamo.testing: [WARNING] High loss value alert - 4.95. Can result in unstable gradients.
[2022-11-12 17:23:30,967] torch._dynamo.testing: [WARNING] High loss value alert - 4.95. Can result in unstable gradients.
[2022-11-12 17:23:31,034] torch._dynamo.testing: [WARNING] High loss value alert - 4.95. Can result in unstable gradients.
[2022-11-12 17:23:54,967] torch._dynamo.testing: [WARNING] High loss value alert - 4.95. Can result in unstable gradients.
cuda train BertForQuestionAnswering PASS
[2022-11-12 17:24:04,461] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
Running huggingface.py BigBird...
[2022-11-12 17:24:04,897] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:24:05,226] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:24:05,543] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
ERROR:common:output 0: torch.Size([1, 1, 12, 64, 192]) != torch.Size([1, 12, 64, 192])
While executing %mul : [#users=2] = call_function[target=torch.ops.aten.mul.Tensor](args = (%permute, %permute_1), kwargs = {})
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/huggingface.py", line 426, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/huggingface.py", line 429, in <graph break in forward_and_backward_pass>
pred = mod(**cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2462, in forward
outputs = self.bert(
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2104, in forward
blocked_encoder_mask, band_mask, from_mask, to_mask = self.create_masks_for_block_sparse_attn(
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2185, in create_masks_for_block_sparse_attn
@staticmethod
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
return compiled_f(
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 963, in new_func
return compiled_fn(args)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 390, in new_fn
fw_outs = call_func_with_args(compiled_fw, args, disable_amp=disable_amp)
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 346, in call_func_with_args
out = normalize_as_list(f(args))
File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 321, in g
return f(*args)
File "/scratch/ezyang/work/pytorch/torch/fx/interpreter.py", line 130, in run
self.env[node] = self.run_node(node)
File "/scratch/ezyang/work/pytorch/functorch/_src/compilers.py", line 162, in run_node
assert nv.size() == rv.size(), f"output {i}: {nv.size()} != {rv.size()}"
AssertionError: output 0: torch.Size([1, 1, 12, 64, 192]) != torch.Size([1, 12, 64, 192])
While executing %mul : [#users=2] = call_function[target=torch.ops.aten.mul.Tensor](args = (%permute, %permute_1), kwargs = {})
TorchDynamo optimized model failed to run because of following error
cuda train BigBird FAIL
[2022-11-12 17:24:13,961] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
Running huggingface.py BlenderbotSmallForCausalLM...
[2022-11-12 17:24:14,009] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:24:14,068] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:24:14,125] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:24:32,378] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
cuda train BlenderbotSmallForCausalLM PASS
[2022-11-12 17:24:41,956] torch._dynamo.testing: [WARNING] High loss value alert - 10.94. Can result in unstable gradients.
Running huggingface.py BlenderbotSmallForConditionalGeneration...
[2022-11-12 17:24:42,054] torch._dynamo.testing: [WARNING] High loss value alert - 10.94. Can result in unstable gradients.
[2022-11-12 17:24:42,188] torch._dynamo.testing: [WARNING] High loss value alert - 10.94. Can result in unstable gradients.
[2022-11-12 17:24:42,307] torch._dynamo.testing: [WARNING] High loss value alert - 10.94. Can result in unstable gradients.
[2022-11-12 17:25:27,138] torch._dynamo.testing: [WARNING] High loss value alert - 10.94. Can result in unstable gradients.
cuda train BlenderbotSmallForConditionalGeneration PASS
[2022-11-12 17:25:36,934] torch._dynamo.testing: [WARNING] High loss value alert - 10.49. Can result in unstable gradients.
Running huggingface.py CamemBert...
[2022-11-12 17:25:37,042] torch._dynamo.testing: [WARNING] High loss value alert - 10.49. Can result in unstable gradients.
[2022-11-12 17:25:37,114] torch._dynamo.testing: [WARNING] High loss value alert - 10.49. Can result in unstable gradients.
[2022-11-12 17:25:37,175] torch._dynamo.testing: [WARNING] High loss value alert - 10.49. Can result in unstable gradients.
[2022-11-12 17:26:01,889] torch._dynamo.testing: [WARNING] High loss value alert - 10.49. Can result in unstable gradients.
cuda train CamemBert PASS
[2022-11-12 17:26:11,634] torch._dynamo.testing: [WARNING] High loss value alert - 10.99. Can result in unstable gradients.
Running huggingface.py DebertaForMaskedLM...
[2022-11-12 17:26:11,759] torch._dynamo.testing: [WARNING] High loss value alert - 10.99. Can result in unstable gradients.
[2022-11-12 17:26:11,847] torch._dynamo.testing: [WARNING] High loss value alert - 10.99. Can result in unstable gradients.
[2022-11-12 17:26:11,925] torch._dynamo.testing: [WARNING] High loss value alert - 10.99. Can result in unstable gradients.
[2022-11-12 17:26:55,125] torch._dynamo.testing: [WARNING] High loss value alert - 10.99. Can result in unstable gradients.
cuda train DebertaForMaskedLM PASS
[2022-11-12 17:27:04,722] torch._dynamo.testing: [WARNING] High loss value alert - 6.08. Can result in unstable gradients.
Running huggingface.py DebertaForQuestionAnswering...
[2022-11-12 17:27:04,828] torch._dynamo.testing: [WARNING] High loss value alert - 6.08. Can result in unstable gradients.
[2022-11-12 17:27:04,922] torch._dynamo.testing: [WARNING] High loss value alert - 6.08. Can result in unstable gradients.
[2022-11-12 17:27:05,009] torch._dynamo.testing: [WARNING] High loss value alert - 6.08. Can result in unstable gradients.
[2022-11-12 17:27:48,482] torch._dynamo.testing: [WARNING] High loss value alert - 6.08. Can result in unstable gradients.
cuda train DebertaForQuestionAnswering PASS
WARNING:__main__:Sequence Length not defined for DistilBertForMaskedLM. Choosing 128 arbitrarily
[2022-11-12 17:27:57,096] torch._dynamo.testing: [WARNING] High loss value alert - 10.39. Can result in unstable gradients.
Running huggingface.py DistilBertForMaskedLM...
[2022-11-12 17:27:57,137] torch._dynamo.testing: [WARNING] High loss value alert - 10.39. Can result in unstable gradients.
[2022-11-12 17:27:57,190] torch._dynamo.testing: [WARNING] High loss value alert - 10.39. Can result in unstable gradients.
[2022-11-12 17:27:57,224] torch._dynamo.testing: [WARNING] High loss value alert - 10.39. Can result in unstable gradients.
[2022-11-12 17:28:09,916] torch._dynamo.testing: [WARNING] High loss value alert - 10.39. Can result in unstable gradients.
cuda train DistilBertForMaskedLM PASS
WARNING:__main__:Sequence Length not defined for DistilBertForQuestionAnswering. Choosing 128 arbitrarily
[2022-11-12 17:28:17,452] torch._dynamo.testing: [WARNING] High loss value alert - 5.02. Can result in unstable gradients.
Running huggingface.py DistilBertForQuestionAnswering...
[2022-11-12 17:28:17,496] torch._dynamo.testing: [WARNING] High loss value alert - 5.02. Can result in unstable gradients.
[2022-11-12 17:28:17,539] torch._dynamo.testing: [WARNING] High loss value alert - 5.02. Can result in unstable gradients.
[2022-11-12 17:28:17,574] torch._dynamo.testing: [WARNING] High loss value alert - 5.02. Can result in unstable gradients.
[2022-11-12 17:28:30,345] torch._dynamo.testing: [WARNING] High loss value alert - 5.02. Can result in unstable gradients.
cuda train DistilBertForQuestionAnswering PASS
[2022-11-12 17:28:39,659] torch._dynamo.testing: [WARNING] High loss value alert - 11.02. Can result in unstable gradients.
Running huggingface.py DistillGPT2...
[2022-11-12 17:28:39,735] torch._dynamo.testing: [WARNING] High loss value alert - 11.02. Can result in unstable gradients.
[2022-11-12 17:28:39,777] torch._dynamo.testing: [WARNING] High loss value alert - 11.02. Can result in unstable gradients.
[2022-11-12 17:28:39,809] torch._dynamo.testing: [WARNING] High loss value alert - 11.02. Can result in unstable gradients.
[2022-11-12 17:28:58,044] torch._dynamo.testing: [WARNING] High loss value alert - 11.02. Can result in unstable gradients.
cuda train DistillGPT2 PASS
If you want to use `ElectraForCausalLM` as a standalone, add `is_decoder=True.`
[2022-11-12 17:29:04,959] torch._dynamo.testing: [WARNING] High loss value alert - 10.35. Can result in unstable gradients.
Running huggingface.py ElectraForCausalLM...
[2022-11-12 17:29:05,028] torch._dynamo.testing: [WARNING] High loss value alert - 10.35. Can result in unstable gradients.
[2022-11-12 17:29:05,110] torch._dynamo.testing: [WARNING] High loss value alert - 10.35. Can result in unstable gradients.
[2022-11-12 17:29:05,177] torch._dynamo.testing: [WARNING] High loss value alert - 10.35. Can result in unstable gradients.
[2022-11-12 17:29:29,425] torch._dynamo.testing: [WARNING] High loss value alert - 10.35. Can result in unstable gradients.
cuda train ElectraForCausalLM PASS
[2022-11-12 17:29:36,485] torch._dynamo.testing: [WARNING] High loss value alert - 6.72. Can result in unstable gradients.
Running huggingface.py ElectraForQuestionAnswering...
[2022-11-12 17:29:36,561] torch._dynamo.testing: [WARNING] High loss value alert - 6.72. Can result in unstable gradients.
[2022-11-12 17:29:36,650] torch._dynamo.testing: [WARNING] High loss value alert - 6.72. Can result in unstable gradients.
[2022-11-12 17:29:36,722] torch._dynamo.testing: [WARNING] High loss value alert - 6.72. Can result in unstable gradients.
[2022-11-12 17:30:01,899] torch._dynamo.testing: [WARNING] High loss value alert - 6.72. Can result in unstable gradients.
cuda train ElectraForQuestionAnswering PASS
Running huggingface.py GPT2ForSequenceClassification...
cuda train GPT2ForSequenceClassification PASS
[2022-11-12 17:30:58,390] torch._dynamo.testing: [WARNING] High loss value alert - 10.56. Can result in unstable gradients.
Running huggingface.py GoogleFnet...
[2022-11-12 17:30:58,481] torch._dynamo.testing: [WARNING] High loss value alert - 10.56. Can result in unstable gradients.
[2022-11-12 17:30:58,530] torch._dynamo.testing: [WARNING] High loss value alert - 10.56. Can result in unstable gradients.
[2022-11-12 17:30:58,583] torch._dynamo.testing: [WARNING] High loss value alert - 10.56. Can result in unstable gradients.
ERROR:common:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
out_code = transform_code_object(code, transform)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 379, in transform
tracer = InstructionTranslator(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1445, in __init__
self.symbolic_locals = collections.OrderedDict(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1446, in <genexpr>
(k, VariableBuilder(self, LocalSource(k))(f_locals[k]))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 141, in __call__
return self._wrap(value).clone(**self.options())
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 215, in _wrap
output = [
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 216, in <listcomp>
VariableBuilder(self.tx, GetItemSource(self.get_source(), i))(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 141, in __call__
return self._wrap(value).clone(**self.options())
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 205, in _wrap
return self.wrap_tensor(value)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 551, in wrap_tensor
tensor_variable = wrap_fx_proxy(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
return wrap_fx_proxy_cls(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 679, in wrap_fx_proxy_cls
example_value = fake_wrapper(example_value)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 745, in wrap_to_fake_tensor_and_record
return wrap_fake_exception(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
return fn()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 746, in <lambda>
lambda: make_fake_tensor(e, tx.fake_mode, static_shapes, tx)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 685, in make_fake_tensor
fake_tensor = fake_mode.from_tensor(e, static_shapes=static_shapes)
File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 1015, in from_tensor
return self.fake_tensor_converter(self, tensor, shape_env=self.shape_env)
File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 266, in __call__
return self.from_real_tensor(fake_mode, t, make_constant, shape_env=shape_env)
File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 238, in from_real_tensor
out = self.meta_converter(t, shape_env=shape_env, callback=mk_fake_tensor)
File "/scratch/ezyang/work/pytorch/torch/_subclasses/meta_utils.py", line 458, in __call__
r = self.meta_tensor(t, shape_env=shape_env, callback=callback)
File "/scratch/ezyang/work/pytorch/torch/_subclasses/meta_utils.py", line 279, in meta_tensor
base = torch.view_as_real(base)
File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 600, in __torch_dispatch__
return func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_ops.py", line 284, in __call__
return self._op(*args, **kwargs or {})
File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 874, in __torch_dispatch__
raise RuntimeError(
RuntimeError: aten.view_as_real.default - couldn't find symbolic meta function/decomposition
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/huggingface.py", line 426, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/huggingface.py", line 429, in <graph break in forward_and_backward_pass>
pred = mod(**cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/fnet/modeling_fnet.py", line 763, in forward
outputs = self.fnet(
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/fnet/modeling_fnet.py", line 604, in forward
encoder_outputs = self.encoder(
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/fnet/modeling_fnet.py", line 308, in forward
layer_outputs = layer_module(hidden_states)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/fnet/modeling_fnet.py", line 267, in forward
self_fourier_outputs = self.fourier(hidden_states)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/fnet/modeling_fnet.py", line 220, in forward
self_outputs = self.self(hidden_states)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
return callback(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
result = inner_convert(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
r = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
return _compile(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 464, in _compile
raise InternalTorchDynamoError() from e
torch._dynamo.exc.InternalTorchDynamoError
TorchDynamo optimized model failed to run because of following error
cuda train GoogleFnet FAIL
[2022-11-12 17:31:07,860] torch._dynamo.testing: [WARNING] High loss value alert - 10.46. Can result in unstable gradients.
Running huggingface.py LayoutLMForMaskedLM...
[2022-11-12 17:31:07,974] torch._dynamo.testing: [WARNING] High loss value alert - 10.46. Can result in unstable gradients.
[2022-11-12 17:31:08,064] torch._dynamo.testing: [WARNING] High loss value alert - 10.46. Can result in unstable gradients.
[2022-11-12 17:31:08,143] torch._dynamo.testing: [WARNING] High loss value alert - 10.46. Can result in unstable gradients.
[2022-11-12 17:31:33,849] torch._dynamo.testing: [WARNING] High loss value alert - 10.46. Can result in unstable gradients.
cuda train LayoutLMForMaskedLM PASS
Running huggingface.py LayoutLMForSequenceClassification...
cuda train LayoutLMForSequenceClassification PASS
WARNING:__main__:Sequence Length not defined for M2M100ForConditionalGeneration. Choosing 128 arbitrarily
[2022-11-12 17:32:30,023] torch._dynamo.testing: [WARNING] High loss value alert - 12.05. Can result in unstable gradients.
Running huggingface.py M2M100ForConditionalGeneration...
[2022-11-12 17:32:30,221] torch._dynamo.testing: [WARNING] High loss value alert - 12.05. Can result in unstable gradients.
[2022-11-12 17:32:30,418] torch._dynamo.testing: [WARNING] High loss value alert - 12.05. Can result in unstable gradients.
[2022-11-12 17:32:30,592] torch._dynamo.testing: [WARNING] High loss value alert - 12.05. Can result in unstable gradients.
[2022-11-12 17:33:33,382] torch._dynamo.testing: [WARNING] High loss value alert - 12.05. Can result in unstable gradients.
cuda train M2M100ForConditionalGeneration PASS
WARNING:__main__:Sequence Length not defined for MBartForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:33:46,613] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
Running huggingface.py MBartForCausalLM...
[2022-11-12 17:33:46,703] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:33:46,803] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:33:46,889] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:34:11,524] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
cuda train MBartForCausalLM PASS
WARNING:__main__:Sequence Length not defined for MBartForConditionalGeneration. Choosing 128 arbitrarily
[2022-11-12 17:34:29,763] torch._dynamo.testing: [WARNING] High loss value alert - 11.03. Can result in unstable gradients.
Running huggingface.py MBartForConditionalGeneration...
[2022-11-12 17:34:29,956] torch._dynamo.testing: [WARNING] High loss value alert - 11.03. Can result in unstable gradients.
[2022-11-12 17:34:30,171] torch._dynamo.testing: [WARNING] High loss value alert - 11.03. Can result in unstable gradients.
[2022-11-12 17:34:30,358] torch._dynamo.testing: [WARNING] High loss value alert - 11.03. Can result in unstable gradients.
[2022-11-12 17:35:32,182] torch._dynamo.testing: [WARNING] High loss value alert - 11.03. Can result in unstable gradients.
cuda train MBartForConditionalGeneration PASS
WARNING:__main__:Sequence Length not defined for MT5ForConditionalGeneration. Choosing 128 arbitrarily
[2022-11-12 17:35:44,230] torch._dynamo.testing: [WARNING] High loss value alert - 103.31. Can result in unstable gradients.
Running huggingface.py MT5ForConditionalGeneration...
WARNING:common:fp64 golden ref were not generated for MT5ForConditionalGeneration
[2022-11-12 17:35:44,450] torch._dynamo.testing: [WARNING] High loss value alert - 103.31. Can result in unstable gradients.
[2022-11-12 17:35:44,639] torch._dynamo.testing: [WARNING] High loss value alert - 103.31. Can result in unstable gradients.
[2022-11-12 17:36:33,702] torch._dynamo.testing: [WARNING] High loss value alert - 103.31. Can result in unstable gradients.
cuda train MT5ForConditionalGeneration PASS
If you want to use `MegatronBertForCausalLM` as a standalone, add `is_decoder=True.`
WARNING:__main__:Sequence Length not defined for MegatronBertForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:36:47,781] torch._dynamo.testing: [WARNING] High loss value alert - 10.54. Can result in unstable gradients.
Running huggingface.py MegatronBertForCausalLM...
[2022-11-12 17:36:47,939] torch._dynamo.testing: [WARNING] High loss value alert - 10.54. Can result in unstable gradients.
[2022-11-12 17:36:48,108] torch._dynamo.testing: [WARNING] High loss value alert - 10.54. Can result in unstable gradients.
[2022-11-12 17:36:48,251] torch._dynamo.testing: [WARNING] High loss value alert - 10.54. Can result in unstable gradients.
[2022-11-12 17:37:41,030] torch._dynamo.testing: [WARNING] High loss value alert - 10.54. Can result in unstable gradients.
cuda train MegatronBertForCausalLM PASS
WARNING:__main__:Sequence Length not defined for MegatronBertForQuestionAnswering. Choosing 128 arbitrarily
[2022-11-12 17:37:54,825] torch._dynamo.testing: [WARNING] High loss value alert - 4.80. Can result in unstable gradients.
Running huggingface.py MegatronBertForQuestionAnswering...
[2022-11-12 17:37:54,990] torch._dynamo.testing: [WARNING] High loss value alert - 4.80. Can result in unstable gradients.
[2022-11-12 17:37:55,179] torch._dynamo.testing: [WARNING] High loss value alert - 4.80. Can result in unstable gradients.
[2022-11-12 17:37:55,341] torch._dynamo.testing: [WARNING] High loss value alert - 4.80. Can result in unstable gradients.
[2022-11-12 17:38:48,532] torch._dynamo.testing: [WARNING] High loss value alert - 4.80. Can result in unstable gradients.
cuda train MegatronBertForQuestionAnswering PASS
WARNING:__main__:Sequence Length not defined for MobileBertForMaskedLM. Choosing 128 arbitrarily
[2022-11-12 17:38:56,956] torch._dynamo.testing: [WARNING] High loss value alert - 10.42. Can result in unstable gradients.
Running huggingface.py MobileBertForMaskedLM...
[2022-11-12 17:38:57,284] torch._dynamo.testing: [WARNING] High loss value alert - 10.42. Can result in unstable gradients.
[2022-11-12 17:38:57,804] torch._dynamo.testing: [WARNING] High loss value alert - 10.42. Can result in unstable gradients.
[2022-11-12 17:38:58,107] torch._dynamo.testing: [WARNING] High loss value alert - 10.42. Can result in unstable gradients.
[2022-11-12 17:40:24,812] torch._dynamo.testing: [WARNING] High loss value alert - 10.42. Can result in unstable gradients.
cuda train MobileBertForMaskedLM PASS
WARNING:__main__:Sequence Length not defined for MobileBertForQuestionAnswering. Choosing 128 arbitrarily
[2022-11-12 17:40:34,101] torch._dynamo.testing: [WARNING] High loss value alert - 4.84. Can result in unstable gradients.
Running huggingface.py MobileBertForQuestionAnswering...
[2022-11-12 17:40:34,424] torch._dynamo.testing: [WARNING] High loss value alert - 4.84. Can result in unstable gradients.
[2022-11-12 17:40:34,949] torch._dynamo.testing: [WARNING] High loss value alert - 4.84. Can result in unstable gradients.
[2022-11-12 17:40:35,249] torch._dynamo.testing: [WARNING] High loss value alert - 4.84. Can result in unstable gradients.
[2022-11-12 17:42:02,202] torch._dynamo.testing: [WARNING] High loss value alert - 4.84. Can result in unstable gradients.
cuda train MobileBertForQuestionAnswering PASS
WARNING:__main__:Sequence Length not defined for OPTForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:42:14,439] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
Running huggingface.py OPTForCausalLM...
[2022-11-12 17:42:14,514] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:42:14,592] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:42:14,659] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:42:38,881] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
cuda train OPTForCausalLM PASS
WARNING:__main__:Sequence Length not defined for PLBartForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:42:47,768] torch._dynamo.testing: [WARNING] High loss value alert - 11.40. Can result in unstable gradients.
Running huggingface.py PLBartForCausalLM...
[2022-11-12 17:42:47,834] torch._dynamo.testing: [WARNING] High loss value alert - 11.40. Can result in unstable gradients.
[2022-11-12 17:42:47,881] torch._dynamo.testing: [WARNING] High loss value alert - 11.40. Can result in unstable gradients.
[2022-11-12 17:42:47,919] torch._dynamo.testing: [WARNING] High loss value alert - 11.40. Can result in unstable gradients.
[2022-11-12 17:43:02,226] torch._dynamo.testing: [WARNING] High loss value alert - 11.40. Can result in unstable gradients.
cuda train PLBartForCausalLM PASS
WARNING:__main__:Sequence Length not defined for PLBartForConditionalGeneration. Choosing 128 arbitrarily
[2022-11-12 17:43:13,462] torch._dynamo.testing: [WARNING] High loss value alert - 11.29. Can result in unstable gradients.
Running huggingface.py PLBartForConditionalGeneration...
[2022-11-12 17:43:13,557] torch._dynamo.testing: [WARNING] High loss value alert - 11.29. Can result in unstable gradients.
[2022-11-12 17:43:13,656] torch._dynamo.testing: [WARNING] High loss value alert - 11.29. Can result in unstable gradients.
[2022-11-12 17:43:13,741] torch._dynamo.testing: [WARNING] High loss value alert - 11.29. Can result in unstable gradients.
[2022-11-12 17:43:49,141] torch._dynamo.testing: [WARNING] High loss value alert - 11.29. Can result in unstable gradients.
cuda train PLBartForConditionalGeneration PASS
WARNING:__main__:Sequence Length not defined for PegasusForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:44:03,142] torch._dynamo.testing: [WARNING] High loss value alert - 11.07. Can result in unstable gradients.
Running huggingface.py PegasusForCausalLM...
[2022-11-12 17:44:03,231] torch._dynamo.testing: [WARNING] High loss value alert - 11.07. Can result in unstable gradients.
[2022-11-12 17:44:03,331] torch._dynamo.testing: [WARNING] High loss value alert - 11.07. Can result in unstable gradients.
[2022-11-12 17:44:03,413] torch._dynamo.testing: [WARNING] High loss value alert - 11.07. Can result in unstable gradients.
[2022-11-12 17:44:27,926] torch._dynamo.testing: [WARNING] High loss value alert - 11.07. Can result in unstable gradients.
cuda train PegasusForCausalLM PASS
WARNING:__main__:Sequence Length not defined for PegasusForConditionalGeneration. Choosing 128 arbitrarily
[2022-11-12 17:44:49,548] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
Running huggingface.py PegasusForConditionalGeneration...
[2022-11-12 17:44:49,742] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:44:50,001] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:44:50,187] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:45:51,694] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
cuda train PegasusForConditionalGeneration PASS
If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`
[2022-11-12 17:46:01,889] torch._dynamo.testing: [WARNING] High loss value alert - 10.48. Can result in unstable gradients.
Running huggingface.py RobertaForCausalLM...
[2022-11-12 17:46:01,973] torch._dynamo.testing: [WARNING] High loss value alert - 10.48. Can result in unstable gradients.
[2022-11-12 17:46:02,060] torch._dynamo.testing: [WARNING] High loss value alert - 10.48. Can result in unstable gradients.
[2022-11-12 17:46:02,134] torch._dynamo.testing: [WARNING] High loss value alert - 10.48. Can result in unstable gradients.
[2022-11-12 17:46:27,549] torch._dynamo.testing: [WARNING] High loss value alert - 10.48. Can result in unstable gradients.
cuda train RobertaForCausalLM PASS
[2022-11-12 17:46:36,256] torch._dynamo.testing: [WARNING] High loss value alert - 4.85. Can result in unstable gradients.
Running huggingface.py RobertaForQuestionAnswering...
[2022-11-12 17:46:36,356] torch._dynamo.testing: [WARNING] High loss value alert - 4.85. Can result in unstable gradients.
[2022-11-12 17:46:36,448] torch._dynamo.testing: [WARNING] High loss value alert - 4.85. Can result in unstable gradients.
[2022-11-12 17:46:36,524] torch._dynamo.testing: [WARNING] High loss value alert - 4.85. Can result in unstable gradients.
[2022-11-12 17:47:01,391] torch._dynamo.testing: [WARNING] High loss value alert - 4.85. Can result in unstable gradients.
cuda train RobertaForQuestionAnswering PASS
WARNING:__main__:Sequence Length not defined for Speech2Text2ForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:47:08,371] torch._dynamo.testing: [WARNING] High loss value alert - 9.20. Can result in unstable gradients.
Running huggingface.py Speech2Text2ForCausalLM...
[2022-11-12 17:47:08,413] torch._dynamo.testing: [WARNING] High loss value alert - 9.20. Can result in unstable gradients.
[2022-11-12 17:47:08,461] torch._dynamo.testing: [WARNING] High loss value alert - 9.20. Can result in unstable gradients.
[2022-11-12 17:47:08,499] torch._dynamo.testing: [WARNING] High loss value alert - 9.20. Can result in unstable gradients.
[2022-11-12 17:47:22,386] torch._dynamo.testing: [WARNING] High loss value alert - 9.20. Can result in unstable gradients.
cuda train Speech2Text2ForCausalLM PASS
[2022-11-12 17:47:29,958] torch._dynamo.testing: [WARNING] High loss value alert - 10.88. Can result in unstable gradients.
Running huggingface.py T5ForConditionalGeneration...
WARNING:common:fp64 golden ref were not generated for T5ForConditionalGeneration
[2022-11-12 17:47:30,153] torch._dynamo.testing: [WARNING] High loss value alert - 10.88. Can result in unstable gradients.
[2022-11-12 17:47:30,281] torch._dynamo.testing: [WARNING] High loss value alert - 10.88. Can result in unstable gradients.
Exception ignored in: <generator object preorder_traversal._preorder_traversal at 0x7fcae4846e40>
Traceback (most recent call last):
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/sympy/core/traversal.py", line 121, in _preorder_traversal
yield node
SystemError: error return without exception set
[2022-11-12 17:48:04,536] torch._dynamo.testing: [WARNING] High loss value alert - 10.88. Can result in unstable gradients.
cuda train T5ForConditionalGeneration PASS
[2022-11-12 17:48:12,909] torch._dynamo.testing: [WARNING] High loss value alert - 10.87. Can result in unstable gradients.
Running huggingface.py T5Small...
WARNING:common:fp64 golden ref were not generated for T5Small
[2022-11-12 17:48:13,082] torch._dynamo.testing: [WARNING] High loss value alert - 10.87. Can result in unstable gradients.
[2022-11-12 17:48:13,210] torch._dynamo.testing: [WARNING] High loss value alert - 10.87. Can result in unstable gradients.
Exception ignored in: <generator object preorder_traversal._preorder_traversal at 0x7f598c565eb0>
Traceback (most recent call last):
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/sympy/core/traversal.py", line 121, in _preorder_traversal
yield node
SystemError: error return without exception set
[2022-11-12 17:48:47,535] torch._dynamo.testing: [WARNING] High loss value alert - 10.87. Can result in unstable gradients.
cuda train T5Small PASS
WARNING:__main__:Sequence Length not defined for TrOCRForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:49:00,110] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
Running huggingface.py TrOCRForCausalLM...
[2022-11-12 17:49:00,201] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:49:00,302] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:49:00,385] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:49:26,266] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
cuda train TrOCRForCausalLM PASS
WARNING:__main__:Sequence Length not defined for XGLMForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:49:46,824] torch._dynamo.testing: [WARNING] High loss value alert - 12.58. Can result in unstable gradients.
Running huggingface.py XGLMForCausalLM...
[2022-11-12 17:49:47,003] torch._dynamo.testing: [WARNING] High loss value alert - 12.58. Can result in unstable gradients.
[2022-11-12 17:49:47,136] torch._dynamo.testing: [WARNING] High loss value alert - 12.58. Can result in unstable gradients.
[2022-11-12 17:49:47,259] torch._dynamo.testing: [WARNING] High loss value alert - 12.58. Can result in unstable gradients.
[2022-11-12 17:50:36,738] torch._dynamo.testing: [WARNING] High loss value alert - 12.58. Can result in unstable gradients.
cuda train XGLMForCausalLM PASS
[2022-11-12 17:50:50,834] torch._dynamo.testing: [WARNING] High loss value alert - 10.50. Can result in unstable gradients.
Running huggingface.py XLNetLMHeadModel...
[2022-11-12 17:50:51,183] torch._dynamo.testing: [WARNING] High loss value alert - 10.50. Can result in unstable gradients.
[2022-11-12 17:50:51,424] torch._dynamo.testing: [WARNING] High loss value alert - 10.50. Can result in unstable gradients.
[2022-11-12 17:50:51,620] torch._dynamo.testing: [WARNING] High loss value alert - 10.50. Can result in unstable gradients.
[2022-11-12 17:52:26,203] torch._dynamo.testing: [WARNING] High loss value alert - 10.50. Can result in unstable gradients.
cuda train XLNetLMHeadModel PASS
[2022-11-12 17:52:37,382] torch._dynamo.testing: [WARNING] High loss value alert - 10.47. Can result in unstable gradients.
Running huggingface.py YituTechConvBert...
[2022-11-12 17:52:38,499] torch._dynamo.testing: [WARNING] High loss value alert - 10.47. Can result in unstable gradients.
[2022-11-12 17:52:38,629] torch._dynamo.testing: [WARNING] High loss value alert - 10.47. Can result in unstable gradients.
[2022-11-12 17:52:38,740] torch._dynamo.testing: [WARNING] High loss value alert - 10.47. Can result in unstable gradients.
[2022-11-12 17:53:19,114] torch._dynamo.testing: [WARNING] High loss value alert - 10.47. Can result in unstable gradients.
cuda train YituTechConvBert PASS
Running timm_models.py adv_inception_v3...
[2022-11-12 17:53:41,502] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:53:49,129] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train adv_inception_v3 PASS
Running timm_models.py beit_base_patch16_224...
cuda train beit_base_patch16_224 PASS
Running timm_models.py botnet26t_256...
cuda train botnet26t_256 PASS
Running timm_models.py cait_m36_384...
cuda train cait_m36_384 PASS
Running timm_models.py coat_lite_mini...
[2022-11-12 17:57:36,187] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 1.02014, (ref-fp64): 0.00051 and shape=torch.Size([2, 1000])
cuda train coat_lite_mini FAIL
Running timm_models.py convit_base...
WARNING:common:fp64 golden ref were not generated for convit_base
ERROR:common:
from user code:
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 138, in get_rel_indices
indy = ind.repeat_interleave(img_size, dim=0).repeat_interleave(img_size, dim=1)
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1078, in run_node
return getattr(args[0], node.target)(*args[1:], **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 883, in __torch_dispatch__
op_impl_out = op_impl(self, func, *args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 362, in dyn_shape
raise DynamicOutputShapeException(func)
torch._subclasses.fake_tensor.DynamicOutputShapeException: aten.repeat_interleave.Tensor
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
return wrap_fake_exception(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
return fn()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
lambda: run_node(tx.output, node, args, kwargs, nnmodule)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
raise RuntimeError(
RuntimeError: Failed running call_method repeat_interleave(*(FakeTensor(FakeTensor(..., device='meta', size=(14, 14), dtype=torch.int64), cpu), 14), **{'dim': 0}):
aten.repeat_interleave.Tensor
(scroll up for backtrace)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 305, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 308, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 333, in forward
x = self.forward_features(x)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 323, in forward_features
x = blk(x)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 214, in forward
x = x + self.drop_path(self.attn(self.norm1(x)))
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 86, in forward
self.rel_indices = self.get_rel_indices(N)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
return callback(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
result = inner_convert(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
r = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
return _compile(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
out_code = transform_code_object(code, transform)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
super().run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 835, in CALL_FUNCTION_KW
self.call_function(fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/misc.py", line 572, in call_function
return self.obj.call_method(tx, self.name, args, kwargs).add_options(self)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/tensor.py", line 331, in call_method
return wrap_fx_proxy(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
return wrap_fx_proxy_cls(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
example_value = get_fake_value(proxy.node, tx)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:
from user code:
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 138, in get_rel_indices
indy = ind.repeat_interleave(img_size, dim=0).repeat_interleave(img_size, dim=1)
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
TorchDynamo optimized model failed to run because of following error
cuda train convit_base FAIL
Running timm_models.py convmixer_768_32...
[2022-11-12 17:58:00,433] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train convmixer_768_32 PASS
Running timm_models.py convnext_base...
cuda train convnext_base PASS
Running timm_models.py crossvit_9_240...
ERROR:common:
from user code:
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/crossvit.py", line 394, in forward_features
x_ = scale_image(x_, ss, self.crop_scale)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/crossvit.py", line 281, in scale_image
x = torch.nn.functional.interpolate(x, size=ss, mode='bicubic', align_corners=False)
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1076, in run_node
return node.target(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/functional.py", line 3964, in interpolate
return torch._C._nn.upsample_bicubic2d(input, output_size, align_corners, scale_factors)
RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
return wrap_fake_exception(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
return fn()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
lambda: run_node(tx.output, node, args, kwargs, nnmodule)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
raise RuntimeError(
RuntimeError: Failed running call_function <function interpolate at 0x7f64d87e71f0>(*(FakeTensor(FakeTensor(..., device='meta', size=(s0, 3, 240, 240)), cuda:0),), **{'size': (224, 224), 'mode': 'bicubic', 'align_corners': False}):
Cannot call sizes() on tensor with symbolic sizes/strides
(scroll up for backtrace)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 305, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 308, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
return callback(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
result = inner_convert(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
r = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
return _compile(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
out_code = transform_code_object(code, transform)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
super().run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 224, in call_function
return super().call_function(tx, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 194, in call_function
return super(UserFunctionVariable, self).call_function(tx, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 65, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 194, in call_function
return super(UserFunctionVariable, self).call_function(tx, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 65, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 835, in CALL_FUNCTION_KW
self.call_function(fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/torch.py", line 403, in call_function
tensor_variable = wrap_fx_proxy(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
return wrap_fx_proxy_cls(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
example_value = get_fake_value(proxy.node, tx)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:
from user code:
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/crossvit.py", line 394, in forward_features
x_ = scale_image(x_, ss, self.crop_scale)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/crossvit.py", line 281, in scale_image
x = torch.nn.functional.interpolate(x, size=ss, mode='bicubic', align_corners=False)
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
TorchDynamo optimized model failed to run because of following error
cuda train crossvit_9_240 FAIL
Running timm_models.py cspdarknet53...
cuda train cspdarknet53 PASS
Running timm_models.py deit_base_distilled_patch16_224...
cuda train deit_base_distilled_patch16_224 PASS
[2022-11-12 18:00:54,914] torch._dynamo.testing: [WARNING] High loss value alert - 1.02. Can result in unstable gradients.
Running timm_models.py dla102...
[2022-11-12 18:00:55,158] torch._dynamo.testing: [WARNING] High loss value alert - 1.02. Can result in unstable gradients.
[2022-11-12 18:00:55,352] torch._dynamo.testing: [WARNING] High loss value alert - 1.02. Can result in unstable gradients.
[2022-11-12 18:00:55,506] torch._dynamo.testing: [WARNING] High loss value alert - 1.02. Can result in unstable gradients.
[2022-11-12 18:01:03,311] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:01:10,384] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:01:10,576] torch._dynamo.testing: [WARNING] High loss value alert - 1.02. Can result in unstable gradients.
cuda train dla102 PASS
Running timm_models.py dm_nfnet_f0...
cuda train dm_nfnet_f0 PASS
Running timm_models.py dpn107...
cuda train dpn107 PASS
Running timm_models.py eca_botnext26ts_256...
cuda train eca_botnext26ts_256 PASS
Running timm_models.py eca_halonext26ts...
cuda train eca_halonext26ts PASS
Running timm_models.py ese_vovnet19b_dw...
cuda train ese_vovnet19b_dw PASS
Running timm_models.py fbnetc_100...
cuda train fbnetc_100 PASS
Running timm_models.py fbnetv3_b...
cuda train fbnetv3_b PASS
Running timm_models.py gernet_l...
cuda train gernet_l PASS
Running timm_models.py ghostnet_100...
[2022-11-12 18:11:04,775] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train ghostnet_100 PASS
Running timm_models.py gluon_inception_v3...
[2022-11-12 18:11:22,833] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:11:30,339] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train gluon_inception_v3 PASS
[2022-11-12 18:11:39,278] torch._dynamo.testing: [WARNING] High loss value alert - 1.00. Can result in unstable gradients.
Running timm_models.py gluon_xception65...
[2022-11-12 18:11:39,522] torch._dynamo.testing: [WARNING] High loss value alert - 1.00. Can result in unstable gradients.
[2022-11-12 18:11:39,706] torch._dynamo.testing: [WARNING] High loss value alert - 1.00. Can result in unstable gradients.
[2022-11-12 18:11:39,831] torch._dynamo.testing: [WARNING] High loss value alert - 1.00. Can result in unstable gradients.
[2022-11-12 18:11:47,847] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:11:48,187] torch._dynamo.testing: [WARNING] High loss value alert - 1.00. Can result in unstable gradients.
Exception ignored in: <generator object preorder_traversal._preorder_traversal at 0x7f9192dfb9e0>
Traceback (most recent call last):
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/sympy/core/traversal.py", line 121, in _preorder_traversal
yield node
SystemError: error return without exception set
cuda train gluon_xception65 PASS
Running timm_models.py gmixer_24_224...
cuda train gmixer_24_224 PASS
Running timm_models.py gmlp_s16_224...
cuda train gmlp_s16_224 PASS
Running timm_models.py hrnet_w18...
ERROR:common:
from user code:
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/hrnet.py", line 713, in stages
yl = self.stage2(xl)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/hrnet.py", line 495, in forward
y = y + fuse_outer[j](x[j])
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1081, in run_node
return nnmodule(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/upsampling.py", line 156, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,
File "/scratch/ezyang/work/pytorch/torch/nn/functional.py", line 3928, in interpolate
return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
return wrap_fake_exception(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
return fn()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
lambda: run_node(tx.output, node, args, kwargs, nnmodule)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
raise RuntimeError(
RuntimeError: Failed running call_module sub1_1_2(*(FakeTensor(FakeTensor(..., device='meta', size=(s0, 18, (s2 - 1)//8 + 1, (s2 - 1)//8 + 1),
grad_fn=<CudnnBatchNormBackward0>), cuda:0),), **{}):
Cannot call sizes() on tensor with symbolic sizes/strides
(scroll up for backtrace)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 305, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 308, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
return callback(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
result = inner_convert(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
r = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
return _compile(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
out_code = transform_code_object(code, transform)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
super().run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 224, in call_function
return super().call_function(tx, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 194, in call_function
return super(UserFunctionVariable, self).call_function(tx, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 65, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 224, in call_function
return super().call_function(tx, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 194, in call_function
return super(UserFunctionVariable, self).call_function(tx, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 65, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 183, in call_function
tx.call_function(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
return tx.inline_user_function_return(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
return inner_fn(self, inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 183, in call_function
tx.call_function(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 202, in call_function
return wrap_fx_proxy(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
return wrap_fx_proxy_cls(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
example_value = get_fake_value(proxy.node, tx)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:
from user code:
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/hrnet.py", line 713, in stages
yl = self.stage2(xl)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/hrnet.py", line 495, in forward
y = y + fuse_outer[j](x[j])
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
TorchDynamo optimized model failed to run because of following error
cuda train hrnet_w18 FAIL
Running timm_models.py inception_v3...
[2022-11-12 18:13:46,782] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:13:54,402] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train inception_v3 PASS
Running timm_models.py jx_nest_base...
[2022-11-12 18:15:04,132] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.00751, (ref-fp64): 0.00020 and shape=torch.Size([128, 3, 4, 4])
[2022-11-12 18:15:04,132] torch._dynamo.utils: [ERROR] Accuracy failed for key name patch_embed.proj.weight.grad
cuda train jx_nest_base FAIL
Running timm_models.py lcnet_050...
cuda train lcnet_050 PASS
Running timm_models.py levit_128...
WARNING:common:fp64 golden ref were not generated for levit_128
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 322, in <module>
main(TimmRunnner())
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1589, in main
return maybe_fresh_cache(run, args.cold_start_latency and args.only)(
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 780, in inner
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1922, in run
runner.run_one_model(
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1305, in run_one_model
status = self.check_accuracy(
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1102, in check_accuracy
correct_result = self.run_n_iterations(
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1026, in run_n_iterations
return self.model_iter_fn(mod, inputs, collect_outputs=True)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 312, in forward_and_backward_pass
self.grad_scaler.scale(loss).backward()
File "/scratch/ezyang/work/pytorch/torch/_tensor.py", line 473, in backward
torch.autograd.backward(
File "/scratch/ezyang/work/pytorch/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
cuda train levit_128 FAIL
Running timm_models.py mixer_b16_224...
cuda train mixer_b16_224 PASS
Running timm_models.py mixnet_l...
cuda train mixnet_l PASS
Running timm_models.py mnasnet_100...
cuda train mnasnet_100 PASS
Running timm_models.py mobilenetv2_100...
cuda train mobilenetv2_100 PASS
Running timm_models.py mobilenetv3_large_100...
cuda train mobilenetv3_large_100 PASS
[2022-11-12 18:19:28,470] torch._dynamo.testing: [WARNING] High loss value alert - 52.30. Can result in unstable gradients.
Running timm_models.py mobilevit_s...
[2022-11-12 18:19:28,588] torch._dynamo.testing: [WARNING] High loss value alert - 52.36. Can result in unstable gradients.
[2022-11-12 18:19:28,700] torch._dynamo.testing: [WARNING] High loss value alert - 52.30. Can result in unstable gradients.
[2022-11-12 18:19:28,784] torch._dynamo.testing: [WARNING] High loss value alert - 52.30. Can result in unstable gradients.
[2022-11-12 18:21:16,207] torch._dynamo.testing: [WARNING] High loss value alert - 52.30. Can result in unstable gradients.
cuda train mobilevit_s PASS
Running timm_models.py nfnet_l0...
cuda train nfnet_l0 PASS
Running timm_models.py pit_b_224...
cuda train pit_b_224 PASS
Running timm_models.py pnasnet5large...
[2022-11-12 18:23:49,227] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:24:39,504] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
Exception ignored in: <generator object preorder_traversal._preorder_traversal at 0x7f5e9c2fce40>
Traceback (most recent call last):
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/sympy/core/traversal.py", line 121, in _preorder_traversal
yield node
SystemError: error return without exception set
cuda train pnasnet5large PASS
Running timm_models.py poolformer_m36...
cuda train poolformer_m36 PASS
[2022-11-12 18:26:22,462] torch._dynamo.testing: [WARNING] High loss value alert - 1.30. Can result in unstable gradients.
Running timm_models.py regnety_002...
[2022-11-12 18:26:22,558] torch._dynamo.testing: [WARNING] High loss value alert - 1.30. Can result in unstable gradients.
[2022-11-12 18:26:22,675] torch._dynamo.testing: [WARNING] High loss value alert - 1.30. Can result in unstable gradients.
[2022-11-12 18:26:22,761] torch._dynamo.testing: [WARNING] High loss value alert - 1.30. Can result in unstable gradients.
[2022-11-12 18:26:48,252] torch._dynamo.testing: [WARNING] High loss value alert - 1.30. Can result in unstable gradients.
cuda train regnety_002 PASS
Running timm_models.py repvgg_a2...
cuda train repvgg_a2 PASS
Running timm_models.py res2net101_26w_4s...
[2022-11-12 18:27:43,851] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train res2net101_26w_4s PASS
Running timm_models.py res2net50_14w_8s...
[2022-11-12 18:28:05,527] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train res2net50_14w_8s PASS
Running timm_models.py res2next50...
[2022-11-12 18:28:21,665] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train res2next50 PASS
Running timm_models.py resmlp_12_224...
cuda train resmlp_12_224 PASS
Running timm_models.py resnest101e...
[2022-11-12 18:29:02,106] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train resnest101e PASS
Running timm_models.py rexnet_100...
[2022-11-12 18:29:19,988] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train rexnet_100 PASS
Running timm_models.py sebotnet33ts_256...
cuda train sebotnet33ts_256 PASS
Running timm_models.py selecsls42b...
[2022-11-12 18:30:35,721] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train selecsls42b PASS
Running timm_models.py spnasnet_100...
cuda train spnasnet_100 PASS
Running timm_models.py swin_base_patch4_window7_224...
ERROR:common:compile_fn raised AssertionError: While executing %truediv : [#users=1] = call_method[target=__truediv__](args = (%getitem_14, 64.0), kwargs = {})
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 449, in call_user_compiler
compiled_fn = self.compiler_fn(gm, self.example_inputs())
File "/scratch/ezyang/work/pytorch/torch/_dynamo/debug_utils.py", line 803, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 94, in compile_fn
return cls(gm, example_inputs).verified_candidate()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 116, in __init__
if not is_aot_autograd_safe_to_run(gm, example_inputs):
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 64, in is_aot_autograd_safe_to_run
mutated = has_mutation(gm, example_inputs, inputs_only=True)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 160, in has_mutation
ShapeAliasingAndMutationProp(new_gm).run(*example_inputs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 124, in run
super().run(*args)
File "/scratch/ezyang/work/pytorch/torch/fx/interpreter.py", line 130, in run
self.env[node] = self.run_node(node)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 59, in run_node
assert result is not NotImplemented
AssertionError: While executing %truediv : [#users=1] = call_method[target=__truediv__](args = (%getitem_14, 64.0), kwargs = {})
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
new_result = optimized_model_iter_fn(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
self.model_iter_fn(mod, inputs, collect_outputs=False)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 305, in forward_and_backward_pass
cloned_inputs = clone_inputs(inputs)
File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 308, in <graph break in forward_and_backward_pass>
pred = mod(*cloned_inputs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/swin_transformer.py", line 568, in forward
x = self.forward_features(x)
File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/swin_transformer.py", line 558, in forward_features
x = self.layers(x)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
return callback(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
result = inner_convert(frame, cache_size)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
return fn(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
r = func(*args, **kwargs)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
return _compile(
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
out_code = transform_code_object(code, transform)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
transformations(instructions, code_options)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
tracer.run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
super().run()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
and self.step()
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
getattr(self, inst.opname)(inst)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 203, in wrapper
self.output.compile_subgraph(self, reason=reason)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 371, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 417, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 454, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised AssertionError: While executing %truediv : [#users=1] = call_method[target=__truediv__](args = (%getitem_14, 64.0), kwargs = {})
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
TorchDynamo optimized model failed to run because of following error
cuda train swin_base_patch4_window7_224 FAIL
Running timm_models.py swsl_resnext101_32x16d...
[2022-11-12 18:31:53,423] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train swsl_resnext101_32x16d PASS
Running timm_models.py tf_efficientnet_b0...
cuda train tf_efficientnet_b0 PASS
Running timm_models.py tf_mixnet_l...
cuda train tf_mixnet_l PASS
Running timm_models.py tinynet_a...
cuda train tinynet_a PASS
Running timm_models.py tnt_s_patch16_224...
cuda train tnt_s_patch16_224 PASS
Running timm_models.py twins_pcpvt_base...
[2022-11-12 18:45:52,863] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.01689, (ref-fp64): 0.00004 and shape=torch.Size([64, 3, 4, 4])
[2022-11-12 18:45:52,863] torch._dynamo.utils: [ERROR] Accuracy failed for key name patch_embeds.0.proj.weight.grad
cuda train twins_pcpvt_base FAIL
Running timm_models.py visformer_small...
[2022-11-12 18:46:11,825] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:46:17,629] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train visformer_small PASS
Running timm_models.py vit_base_patch16_224...
cuda train vit_base_patch16_224 PASS
Running timm_models.py volo_d1_224...
[2022-11-12 18:47:18,248] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:47:35,182] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train volo_d1_224 PASS
Running timm_models.py xcit_large_24_p8_224...
WARNING:common:fp64 golden ref were not generated for xcit_large_24_p8_224
[2022-11-12 18:48:06,375] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:48:21,465] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train xcit_large_24_p8_224 PASS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment