ezyang/gist:9de8e745291474ef3df59b62cbf71796 Secret

## gistfile0.txt
Running torchbench.py BERT_pytorch...
cuda train BERT_pytorch                       PASS
Running torchbench.py Background_Matting...
[2022-11-12 16:50:12,257] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 16:50:17,079] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train Background_Matting                 PASS

sampling loop time step:   0%|          | 0/1 [00:00<?, ?it/s]
sampling loop time step: 100%|██████████| 1/1 [00:00<00:00, 55.09it/s]

0it [00:00, ?it/s]

sampling loop time step:   0%|          | 0/1 [00:00<?, ?it/s][A

sampling loop time step: 100%|██████████| 1/1 [00:01<00:00,  1.14s/it][A
sampling loop time step: 100%|██████████| 1/1 [00:01<00:00,  1.14s/it]

1it [00:02,  2.01s/it]

sampling loop time step:   0%|          | 0/1 [00:00<?, ?it/s][A
sampling loop time step: 100%|██████████| 1/1 [00:00<00:00, 16.65it/s]

2it [00:02,  1.37s/it]
2it [00:02,  1.47s/it]
WARNING:root:DALLE2_pytorch failed to load
Eager model failed to run
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 983, in validate_model
    self.model_iter_fn(model, example_inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 337, in forward_and_backward_pass
    self.grad_scaler.scale(loss).backward()
  File "/scratch/ezyang/work/pytorch/torch/_tensor.py", line 473, in backward
    torch.autograd.backward(
  File "/scratch/ezyang/work/pytorch/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1893, in run
    device, name, model, example_inputs, batch_size = runner.load_model(
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 282, in load_model
    self.validate_model(model, example_inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 985, in validate_model
    raise NotImplementedError("Eager model failed to run")
NotImplementedError: Eager model failed to run

Running torchbench.py LearningToPaint...
[2022-11-12 16:50:45,757] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 16:50:47,613] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train LearningToPaint                    PASS
[2022-11-12 16:50:56,537] torch._dynamo.testing: [WARNING] High loss value alert - 12.64. Can result in unstable gradients.
Running torchbench.py Super_SloMo...
[2022-11-12 16:50:58,053] torch._dynamo.testing: [WARNING] High loss value alert - 12.64. Can result in unstable gradients.
[2022-11-12 16:50:58,162] torch._dynamo.testing: [WARNING] High loss value alert - 12.64. Can result in unstable gradients.
[2022-11-12 16:50:58,261] torch._dynamo.testing: [WARNING] High loss value alert - 12.64. Can result in unstable gradients.
[2022-11-12 16:53:38,843] torch._dynamo.testing: [WARNING] High loss value alert - 12.64. Can result in unstable gradients.
cuda train Super_SloMo                        PASS
Running torchbench.py alexnet...
cuda train alexnet                            PASS
Running torchbench.py attention_is_all_you_need_pytorch...
cuda train attention_is_all_you_need_pytorch  PASS
Running torchbench.py dcgan...
[2022-11-12 16:54:54,404] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 16:54:55,066] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train dcgan                              PASS
Running torchbench.py densenet121...
[2022-11-12 16:55:16,761] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train densenet121                        PASS
WARNING:root:detectron2_fcos_r_50_fpn failed to load
FCOS train is not supported by upstream detectron2. See GH Issue: https://github.com/facebookresearch/detectron2/issues/4369.
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1893, in run
    device, name, model, example_inputs, batch_size = runner.load_model(
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 251, in load_model
    benchmark = benchmark_cls(
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/util/model.py", line 18, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/detectron2_fcos_r_50_fpn/__init__.py", line 15, in __init__
    super().__init__(variant="COCO-Detection/fcos_R_50_FPN_1x.py", test=test, device=device,
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/util/framework/detectron2/model_factory.py", line 100, in __init__
    loader = self.setup_train(cfg, args)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/util/framework/detectron2/model_factory.py", line 110, in setup_train
    raise NotImplementedError("FCOS train is not supported by upstream detectron2. " \
NotImplementedError: FCOS train is not supported by upstream detectron2. See GH Issue: https://github.com/facebookresearch/detectron2/issues/4369.

WARNING:root:detectron2_maskrcnn_r_50_c4 failed to load
Eager model failed to run
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 983, in validate_model
    self.model_iter_fn(model, example_inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 336, in forward_and_backward_pass
    loss = self.compute_loss(pred)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 326, in compute_loss
    return reduce_to_scalar_loss(pred)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/testing.py", line 83, in reduce_to_scalar_loss
    return sum([reduce_to_scalar_loss(x) for x in out]) / len(out)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/testing.py", line 83, in <listcomp>
    return sum([reduce_to_scalar_loss(x) for x in out]) / len(out)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/testing.py", line 93, in reduce_to_scalar_loss
    return sum([reduce_to_scalar_loss(value) for value in out.values()]) / len(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/testing.py", line 93, in <listcomp>
    return sum([reduce_to_scalar_loss(value) for value in out.values()]) / len(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/testing.py", line 96, in reduce_to_scalar_loss
    raise NotImplementedError("Don't know how to reduce", type(out))
NotImplementedError: ("Don't know how to reduce", <class 'detectron2.structures.instances.Instances'>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1893, in run
    device, name, model, example_inputs, batch_size = runner.load_model(
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 282, in load_model
    self.validate_model(model, example_inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 985, in validate_model
    raise NotImplementedError("Eager model failed to run")
NotImplementedError: Eager model failed to run

Running torchbench.py dlrm...
incomplete graph:
class joint_forward_backward(torch.nn.Module):
    def forward(self, primals, tangents):
        primals_1: f32[512, 512], [512, 1], primals_2: f32[512], [1], primals_3: f32[64, 512], [512, 1], primals_4: f32[64], [1], primals_5: f32[1000000, 64], [64, 1], primals_6: f32[1000000, 64], [64, 1], primals_7: f32[1000000, 64], [64, 1], primals_8: f32[1000000, 64], [64, 1], primals_9: f32[1000000, 64], [64, 1], primals_10: f32[1000000, 64], [64, 1], primals_11: f32[1000000, 64], [64, 1], primals_12: f32[1000000, 64], [64, 1], primals_13: f32[1024, 100], [100, 1], primals_14: f32[1024], [1], primals_15: f32[1024, 1024], [1024, 1], primals_16: f32[1024], [1], primals_17: f32[1024, 1024], [1024, 1], primals_18: f32[1024], [1], primals_19: f32[1, 1024], [1024, 1], primals_20: f32[1], [1], primals_21: f32[s0, 512], [512, 1], primals_22: i64[s2, s0], [s0, 1], primals_23: i64[s3], [1], primals_24: i64[s3], [1], primals_25: i64[s3], [1], primals_26: i64[s3], [1], primals_27: i64[s3], [1], primals_28: i64[s3], [1], primals_29: i64[s3], [1], primals_30: i64[s3], [1], tangents_1: f32[s0, 1], [1, 1], = fx_pytree.tree_flatten_spec([primals, tangents], self._in_spec)
        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        t: f32[512, 512], [1, 512] = torch.ops.aten.t.default(primals_1);  primals_1 = None
        addmm: f32[s0, 512], [512, 1] = torch.ops.aten.addmm.default(primals_2, primals_21, t);  primals_2 = t = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        relu: f32[s0, 512], [512, 1] = torch.ops.aten.relu.default(addmm);  addmm = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        t_1: f32[512, 64], [1, 512] = torch.ops.aten.t.default(primals_3);  primals_3 = None
        addmm_1: f32[s0, 64], [64, 1] = torch.ops.aten.addmm.default(primals_4, relu, t_1);  primals_4 = relu = t_1 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        relu_1: f32[s0, 64], [64, 1] = torch.ops.aten.relu.default(addmm_1);  addmm_1 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
        select: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 0)

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        _embedding_bag = torch.ops.aten._embedding_bag.default(primals_5, primals_23, select, False, 0, True, None);  primals_5 = primals_23 = select = None
        getitem: f32[s0, 64], [64, 1] = _embedding_bag[0]
        getitem_1: i64[s3], [1] = _embedding_bag[1]
        getitem_2: i64[s0], [1] = _embedding_bag[2]
        getitem_3: i64[0], [1] = _embedding_bag[3];  _embedding_bag = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
        select_1: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 1)

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        _embedding_bag_1 = torch.ops.aten._embedding_bag.default(primals_6, primals_24, select_1, False, 0, True, None);  primals_6 = primals_24 = select_1 = None
        getitem_4: f32[s0, 64], [64, 1] = _embedding_bag_1[0]
        getitem_5: i64[s3], [1] = _embedding_bag_1[1]
        getitem_6: i64[s0], [1] = _embedding_bag_1[2]
        getitem_7: i64[0], [1] = _embedding_bag_1[3];  _embedding_bag_1 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
        select_2: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 2)

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        _embedding_bag_2 = torch.ops.aten._embedding_bag.default(primals_7, primals_25, select_2, False, 0, True, None);  primals_7 = primals_25 = select_2 = None
        getitem_8: f32[s0, 64], [64, 1] = _embedding_bag_2[0]
        getitem_9: i64[s3], [1] = _embedding_bag_2[1]
        getitem_10: i64[s0], [1] = _embedding_bag_2[2]
        getitem_11: i64[0], [1] = _embedding_bag_2[3];  _embedding_bag_2 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
        select_3: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 3)

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        _embedding_bag_3 = torch.ops.aten._embedding_bag.default(primals_8, primals_26, select_3, False, 0, True, None);  primals_8 = primals_26 = select_3 = None
        getitem_12: f32[s0, 64], [64, 1] = _embedding_bag_3[0]
        getitem_13: i64[s3], [1] = _embedding_bag_3[1]
        getitem_14: i64[s0], [1] = _embedding_bag_3[2]
        getitem_15: i64[0], [1] = _embedding_bag_3[3];  _embedding_bag_3 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
        select_4: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 4)

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        _embedding_bag_4 = torch.ops.aten._embedding_bag.default(primals_9, primals_27, select_4, False, 0, True, None);  primals_9 = primals_27 = select_4 = None
        getitem_16: f32[s0, 64], [64, 1] = _embedding_bag_4[0]
        getitem_17: i64[s3], [1] = _embedding_bag_4[1]
        getitem_18: i64[s0], [1] = _embedding_bag_4[2]
        getitem_19: i64[0], [1] = _embedding_bag_4[3];  _embedding_bag_4 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
        select_5: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 5)

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        _embedding_bag_5 = torch.ops.aten._embedding_bag.default(primals_10, primals_28, select_5, False, 0, True, None);  primals_10 = primals_28 = select_5 = None
        getitem_20: f32[s0, 64], [64, 1] = _embedding_bag_5[0]
        getitem_21: i64[s3], [1] = _embedding_bag_5[1]
        getitem_22: i64[s0], [1] = _embedding_bag_5[2]
        getitem_23: i64[0], [1] = _embedding_bag_5[3];  _embedding_bag_5 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
        select_6: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 6)

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        _embedding_bag_6 = torch.ops.aten._embedding_bag.default(primals_11, primals_29, select_6, False, 0, True, None);  primals_11 = primals_29 = select_6 = None
        getitem_24: f32[s0, 64], [64, 1] = _embedding_bag_6[0]
        getitem_25: i64[s3], [1] = _embedding_bag_6[1]
        getitem_26: i64[s0], [1] = _embedding_bag_6[2]
        getitem_27: i64[0], [1] = _embedding_bag_6[3];  _embedding_bag_6 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:288, code: sparse_offset_group_batch = lS_o[k]
        select_7: i64[s0], [1] = torch.ops.aten.select.int(primals_22, 0, 7);  primals_22 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        _embedding_bag_7 = torch.ops.aten._embedding_bag.default(primals_12, primals_30, select_7, False, 0, True, None);  primals_12 = select_7 = None
        getitem_28: f32[s0, 64], [64, 1] = _embedding_bag_7[0]
        getitem_29: i64[s3], [1] = _embedding_bag_7[1]
        getitem_30: i64[s0], [1] = _embedding_bag_7[2]
        getitem_31: i64[0], [1] = _embedding_bag_7[3];  _embedding_bag_7 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:306, code: T = torch.cat([x] + ly, dim=1).view((batch_size, -1, d))
        cat: f32[s0, 576], [576, 1] = torch.ops.aten.cat.default([relu_1, getitem, getitem_4, getitem_8, getitem_12, getitem_16, getitem_20, getitem_24, getitem_28], 1);  getitem = getitem_4 = getitem_8 = getitem_12 = getitem_16 = getitem_20 = getitem_24 = getitem_28 = None
        sym_size: Sym(s0) = torch.ops.aten.sym_size(primals_21, 0);  primals_21 = None
        view: f32[s0, 9, 64], [576, 64, 1] = torch.ops.aten.view.default(cat, [sym_size, -1, 64]);  cat = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:308, code: Z = torch.bmm(T, torch.transpose(T, 1, 2))
        transpose: f32[s0, 64, 9], [576, 1, 64] = torch.ops.aten.transpose.int(view, 1, 2)
        bmm: f32[s0, 9, 9], [81, 9, 1] = torch.ops.aten.bmm.default(view, transpose)

        # No stacktrace found for following nodes
        _tensor_constant0 = self._tensor_constant0

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:319, code: li = torch.tensor([i for i in range(ni) for j in range(i + offset)], device=x.device)
        lift_fresh_copy: i64[36], [1] = torch.ops.aten.lift_fresh_copy.default(_tensor_constant0);  _tensor_constant0 = None

        # No stacktrace found for following nodes
        _tensor_constant1 = self._tensor_constant1

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:320, code: lj = torch.tensor([j for i in range(nj) for j in range(i + offset)], device=x.device)
        lift_fresh_copy_1: i64[36], [1] = torch.ops.aten.lift_fresh_copy.default(_tensor_constant1);  _tensor_constant1 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:321, code: Zflat = Z[:, li, lj]
        slice_1: f32[s0, 9, 9], [81, 9, 1] = torch.ops.aten.slice.Tensor(bmm, 0, 0, 9223372036854775807);  bmm = None
        index: f32[s0, 36], [36, 1] = torch.ops.aten.index.Tensor(slice_1, [None, lift_fresh_copy, lift_fresh_copy_1])

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:323, code: R = torch.cat([x] + [Zflat], dim=1)
        cat_1: f32[s0, 100], [100, 1] = torch.ops.aten.cat.default([relu_1, index], 1);  relu_1 = index = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        t_2: f32[100, 1024], [1, 100] = torch.ops.aten.t.default(primals_13);  primals_13 = None
        addmm_2: f32[s0, 1024], [1024, 1] = torch.ops.aten.addmm.default(primals_14, cat_1, t_2);  primals_14 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        relu_2: f32[s0, 1024], [1024, 1] = torch.ops.aten.relu.default(addmm_2);  addmm_2 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        t_3: f32[1024, 1024], [1, 1024] = torch.ops.aten.t.default(primals_15);  primals_15 = None
        addmm_3: f32[s0, 1024], [1024, 1] = torch.ops.aten.addmm.default(primals_16, relu_2, t_3);  primals_16 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        relu_3: f32[s0, 1024], [1024, 1] = torch.ops.aten.relu.default(addmm_3);  addmm_3 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        t_4: f32[1024, 1024], [1, 1024] = torch.ops.aten.t.default(primals_17);  primals_17 = None
        addmm_4: f32[s0, 1024], [1024, 1] = torch.ops.aten.addmm.default(primals_18, relu_3, t_4);  primals_18 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        relu_4: f32[s0, 1024], [1024, 1] = torch.ops.aten.relu.default(addmm_4);  addmm_4 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        t_5: f32[1024, 1], [1, 1024] = torch.ops.aten.t.default(primals_19);  primals_19 = None
        addmm_5: f32[s0, 1], [1, 1] = torch.ops.aten.addmm.default(primals_20, relu_4, t_5);  primals_20 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        relu_5: f32[s0, 1], [1, 1] = torch.ops.aten.relu.default(addmm_5);  addmm_5 = None

        # No stacktrace found for following nodes
        is_same_size = torch.ops.aten.is_same_size.default(relu_5, tangents_1)

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        threshold_backward: f32[s0, 1], [1, 1] = torch.ops.aten.threshold_backward.default(tangents_1, relu_5, 0);  tangents_1 = relu_5 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        t_6: f32[1, 1024], [1024, 1] = torch.ops.aten.t.default(t_5);  t_5 = None
        mm: f32[s0, 1024], [1024, 1] = torch.ops.aten.mm.default(threshold_backward, t_6);  t_6 = None
        t_7: f32[1, s0], [1, 1] = torch.ops.aten.t.default(threshold_backward)
        mm_1: f32[1, 1024], [1024, 1] = torch.ops.aten.mm.default(t_7, relu_4);  t_7 = None
        t_8: f32[1024, 1], [1, 1024] = torch.ops.aten.t.default(mm_1);  mm_1 = None
        sum_1: f32[1, 1], [0, 1] = torch.ops.aten.sum.dim_IntList(threshold_backward, [0], True);  threshold_backward = None
        view_1: f32[1], [0] = torch.ops.aten.view.default(sum_1, [1]);  sum_1 = None
        t_9: f32[1, 1024], [1024, 1] = torch.ops.aten.t.default(t_8);  t_8 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        threshold_backward_1: f32[s0, 1024], [1024, 1] = torch.ops.aten.threshold_backward.default(mm, relu_4, 0);  mm = relu_4 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        t_10: f32[1024, 1024], [1024, 1] = torch.ops.aten.t.default(t_4);  t_4 = None
        mm_2: f32[s0, 1024], [1024, 1] = torch.ops.aten.mm.default(threshold_backward_1, t_10);  t_10 = None
        t_11: f32[1024, s0], [1, 1024] = torch.ops.aten.t.default(threshold_backward_1)
        mm_3: f32[1024, 1024], [1024, 1] = torch.ops.aten.mm.default(t_11, relu_3);  t_11 = None
        t_12: f32[1024, 1024], [1, 1024] = torch.ops.aten.t.default(mm_3);  mm_3 = None
        sum_2: f32[1, 1024], [0, 1] = torch.ops.aten.sum.dim_IntList(threshold_backward_1, [0], True);  threshold_backward_1 = None
        view_2: f32[1024], [1] = torch.ops.aten.view.default(sum_2, [1024]);  sum_2 = None
        t_13: f32[1024, 1024], [1024, 1] = torch.ops.aten.t.default(t_12);  t_12 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        threshold_backward_2: f32[s0, 1024], [1024, 1] = torch.ops.aten.threshold_backward.default(mm_2, relu_3, 0);  mm_2 = relu_3 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        t_14: f32[1024, 1024], [1024, 1] = torch.ops.aten.t.default(t_3);  t_3 = None
        mm_4: f32[s0, 1024], [1024, 1] = torch.ops.aten.mm.default(threshold_backward_2, t_14);  t_14 = None
        t_15: f32[1024, s0], [1, 1024] = torch.ops.aten.t.default(threshold_backward_2)
        mm_5: f32[1024, 1024], [1024, 1] = torch.ops.aten.mm.default(t_15, relu_2);  t_15 = None
        t_16: f32[1024, 1024], [1, 1024] = torch.ops.aten.t.default(mm_5);  mm_5 = None
        sum_3: f32[1, 1024], [0, 1] = torch.ops.aten.sum.dim_IntList(threshold_backward_2, [0], True);  threshold_backward_2 = None
        view_3: f32[1024], [1] = torch.ops.aten.view.default(sum_3, [1024]);  sum_3 = None
        t_17: f32[1024, 1024], [1024, 1] = torch.ops.aten.t.default(t_16);  t_16 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        threshold_backward_3: f32[s0, 1024], [1024, 1] = torch.ops.aten.threshold_backward.default(mm_4, relu_2, 0);  mm_4 = relu_2 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:274, code: return layers(x)
        t_18: f32[1024, 100], [100, 1] = torch.ops.aten.t.default(t_2);  t_2 = None
        mm_6: f32[s0, 100], [100, 1] = torch.ops.aten.mm.default(threshold_backward_3, t_18);  t_18 = None
        t_19: f32[1024, s0], [1, 1024] = torch.ops.aten.t.default(threshold_backward_3)
        mm_7: f32[1024, 100], [100, 1] = torch.ops.aten.mm.default(t_19, cat_1);  t_19 = cat_1 = None
        t_20: f32[100, 1024], [1, 100] = torch.ops.aten.t.default(mm_7);  mm_7 = None
        sum_4: f32[1, 1024], [0, 1] = torch.ops.aten.sum.dim_IntList(threshold_backward_3, [0], True);  threshold_backward_3 = None
        view_4: f32[1024], [1] = torch.ops.aten.view.default(sum_4, [1024]);  sum_4 = None
        t_21: f32[1024, 100], [100, 1] = torch.ops.aten.t.default(t_20);  t_20 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:323, code: R = torch.cat([x] + [Zflat], dim=1)
        slice_2: f32[s0, 64], [100, 1] = torch.ops.aten.slice.Tensor(mm_6, 1, 0, 64)
        slice_3: f32[s0, 36], [100, 1] = torch.ops.aten.slice.Tensor(mm_6, 1, 64, 100);  mm_6 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:321, code: Zflat = Z[:, li, lj]
        sym_size_1: Sym(s0) = torch.ops.aten.sym_size(slice_1, 0);  slice_1 = None
        sym_size_2: Sym(9) = torch.ops.aten.sym_size(view, 1)
        new_zeros: f32[s0, 9, 9], [81, 9, 1] = torch.ops.aten.new_zeros.default(slice_3, [sym_size_1, sym_size_2, sym_size_2], dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0));  sym_size_1 = None
        index_put: f32[s0, 9, 9], [81, 9, 1] = torch.ops.aten.index_put.default(new_zeros, [None, lift_fresh_copy, lift_fresh_copy_1], slice_3, True);  new_zeros = lift_fresh_copy = lift_fresh_copy_1 = slice_3 = None
        slice_backward: f32[s0, 9, 9], [81, 9, 1] = torch.ops.aten.slice_backward.default(index_put, [sym_size, sym_size_2, sym_size_2], 0, 0, 9223372036854775807, 1);  index_put = sym_size_2 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:308, code: Z = torch.bmm(T, torch.transpose(T, 1, 2))
        transpose_1: f32[s0, 64, 9], [576, 1, 64] = torch.ops.aten.transpose.int(view, 1, 2);  view = None
        bmm_1: f32[s0, 64, 9], [576, 9, 1] = torch.ops.aten.bmm.default(transpose_1, slice_backward);  transpose_1 = None
        transpose_2: f32[s0, 9, 64], [576, 64, 1] = torch.ops.aten.transpose.int(transpose, 1, 2);  transpose = None
        bmm_2: f32[s0, 9, 64], [576, 64, 1] = torch.ops.aten.bmm.default(slice_backward, transpose_2);  slice_backward = transpose_2 = None
        transpose_3: f32[s0, 9, 64], [576, 1, 9] = torch.ops.aten.transpose.int(bmm_1, 1, 2);  bmm_1 = None

        # Gradient addition node due to multiple use of tensor around:, File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:308, code: Z = torch.bmm(T, torch.transpose(T, 1, 2))
        add: f32[s0, 9, 64], [576, 64, 1] = torch.ops.aten.add.Tensor(bmm_2, transpose_3);  bmm_2 = transpose_3 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:306, code: T = torch.cat([x] + ly, dim=1).view((batch_size, -1, d))
        _reshape_alias: f32[s0, 576], [576, 1] = torch.ops.aten._reshape_alias.default(add, [sym_size, 576], [576, 1]);  add = sym_size = None
        slice_4: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 0, 64)
        slice_5: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 64, 128)
        slice_6: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 128, 192)
        slice_7: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 192, 256)
        slice_8: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 256, 320)
        slice_9: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 320, 384)
        slice_10: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 384, 448)
        slice_11: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 448, 512)
        slice_12: f32[s0, 64], [576, 1] = torch.ops.aten.slice.Tensor(_reshape_alias, 1, 512, 576);  _reshape_alias = None

        # Gradient addition node due to multiple use of tensor around:, File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:306, code: T = torch.cat([x] + ly, dim=1).view((batch_size, -1, d))
        add_1: f32[s0, 64], [64, 1] = torch.ops.aten.add.Tensor(slice_2, slice_4);  slice_2 = slice_4 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        index_select: f32[s3, 64], [64, 1] = torch.ops.aten.index_select.default(slice_12, 0, getitem_29)
        sym_size_3: Sym(s3) = torch.ops.aten.sym_size(primals_30, 0)

        # No stacktrace found for following nodes
        floordiv: Sym(s3) = sym_size_3 // 1;  sym_size_3 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        sym_stride: Sym(1) = torch.ops.aten.sym_stride(primals_30, 0)

        # No stacktrace found for following nodes
        mul: Sym(s3) = floordiv * sym_stride

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        _reshape_alias_1: i64[1, s3], [s3, 1] = torch.ops.aten._reshape_alias.default(primals_30, [1, floordiv], [mul, sym_stride]);  primals_30 = floordiv = mul = sym_stride = None
        sym_numel: Sym(s3) = torch.ops.aten.sym_numel(getitem_29);  getitem_29 = None
        sym_size_4: Sym(64) = torch.ops.aten.sym_size(slice_12, 1);  slice_12 = None

        # No stacktrace found for following nodes
        mul_1: Sym(64*s3) = sym_numel * sym_size_4;  sym_numel = None
        floordiv_1: Sym(s3) = mul_1 // sym_size_4;  mul_1 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py:295, code: V = E(sparse_index_group_batch, sparse_offset_group_batch)
        _reshape_alias_2: f32[s3, 64], [64, 1] = torch.ops.aten._reshape_alias.default(index_select, [floordiv_1, sym_size_4], [sym_size_4, 1]);  index_select = floordiv_1 = None
        _sparse_coo_tensor_with_dims_and_tensors: f32[1000000, 64], [0, 0] = torch.ops.aten._sparse_coo_tensor_with_dims_and_tensors.default(1, 1, [1000000, sym_size_4], _reshape_alias_1, _reshape_alias_2, dtype = torch.float32, layout = torch.sparse_coo, device = device(type='cuda', index=0), pin_memory = None);  sym_size_4 = _reshape_alias_1 = _reshape_alias_2 = None
        ERROR:common:Cannot access storage of SparseTensorImpl
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/dlrm/dlrm_s_pytorch.py", line 336, in forward
    def forward(self, dense_x, lS_o, lS_i):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
    return compiled_f(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 958, in new_func
    compiled_fn = create_aot_dispatcher_function(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 678, in create_aot_dispatcher_function
    aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 495, in aot_dispatch_autograd
    fx_g = make_fx(
  File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 665, in wrapped
    t = dispatch_trace(wrap_key(func, args, fx_tracer), tracer=fx_tracer, concrete_args=tuple(phs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 422, in dispatch_trace
    graph = tracer.trace(root, concrete_args)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/fx/_symbolic_trace.py", line 739, in trace
    (self.create_arg(fn(*args)),),
  File "/scratch/ezyang/work/pytorch/torch/fx/_symbolic_trace.py", line 614, in flatten_fn
    tree_out = root_fn(*tree_args)
  File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 439, in wrapped
    out = f(*tensors)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 189, in inner
    outs = f(*f_args, **f_kwargs)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 257, in joint_forward_backward
    backward_out = torch.autograd.grad(
  File "/scratch/ezyang/work/pytorch/torch/autograd/__init__.py", line 300, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
NotImplementedError: Cannot access storage of SparseTensorImpl

TorchDynamo optimized model failed to run because of following error
cuda train dlrm                               FAIL
/scratch/ezyang/work/pytorch/torch/utils/tensorboard/__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if not hasattr(tensorboard, "__version__") or LooseVersion(
/scratch/ezyang/work/env/lib/python3.9/site-packages/gym/core.py:317: DeprecationWarning: [33mWARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.[0m
  deprecation(
Running torchbench.py drq...
cuda train drq                                PASS
Running torchbench.py fastNLP_Bert...
cuda train fastNLP_Bert                       PASS
Running torchbench.py functorch_dp_cifar10...
cuda train functorch_dp_cifar10               PASS
Running torchbench.py functorch_maml_omniglot...
[2022-11-12 16:57:17,407] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 16:57:18,455] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train functorch_maml_omniglot            PASS
Running torchbench.py hf_Albert...
cuda train hf_Albert                          PASS
Running torchbench.py hf_Bart...
cuda train hf_Bart                            PASS
Running torchbench.py hf_Bert...
cuda train hf_Bert                            PASS
Running torchbench.py hf_BigBird...
ERROR:common:output 0: torch.Size([2, 1, 12, 64, 192]) != torch.Size([2, 12, 64, 192])

While executing %mul : [#users=2] = call_function[target=torch.ops.aten.mul.Tensor](args = (%permute, %permute_1), kwargs = {})
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2462, in forward
    outputs = self.bert(
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2104, in forward
    blocked_encoder_mask, band_mask, from_mask, to_mask = self.create_masks_for_block_sparse_attn(
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2185, in create_masks_for_block_sparse_attn
    @staticmethod
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
    return compiled_f(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 963, in new_func
    return compiled_fn(args)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 390, in new_fn
    fw_outs = call_func_with_args(compiled_fw, args, disable_amp=disable_amp)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 346, in call_func_with_args
    out = normalize_as_list(f(args))
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 321, in g
    return f(*args)
  File "/scratch/ezyang/work/pytorch/torch/fx/interpreter.py", line 130, in run
    self.env[node] = self.run_node(node)
  File "/scratch/ezyang/work/pytorch/functorch/_src/compilers.py", line 162, in run_node
    assert nv.size() == rv.size(), f"output {i}: {nv.size()} != {rv.size()}"
AssertionError: output 0: torch.Size([2, 1, 12, 64, 192]) != torch.Size([2, 12, 64, 192])

While executing %mul : [#users=2] = call_function[target=torch.ops.aten.mul.Tensor](args = (%permute, %permute_1), kwargs = {})
TorchDynamo optimized model failed to run because of following error
cuda train hf_BigBird                         FAIL
Running torchbench.py hf_DistilBert...
cuda train hf_DistilBert                      PASS
Running torchbench.py hf_GPT2...
cuda train hf_GPT2                            PASS
Running torchbench.py hf_GPT2_large...
cuda train hf_GPT2_large                      PASS
Running torchbench.py hf_Longformer...
[2022-11-12 17:01:53,287] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:01:54,658] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:03:12,259] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64)
   function: '_chunk' (/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/longformer/modeling_longformer.py:770)
   reasons:  ['(Eq(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0], 512*hidden_states.size()[0]*hidden_states.stride()[0]*hidden_states.size()[1]//512) and\nEq(Mod(hidden_states.size()[1], hidden_states.size()[1]//512), 0) and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= 0 and\nNe(hidden_states.size()[1]//512, 0) and\n4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0] <= 4*hidden_states.size()[1]*2*768 and\nNe(hidden_states.size()[1]/hidden_states.size()[1]//512, 1) and\nNe(hidden_states.size()[1]//512, 1) and\nNe(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512, 1) and\nhidden_states.size()[1]//512 >= 2 and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= hidden_states.stride()[0] and\nhidden_states.size()[1]/hidden_states.size()[1]//512 >= 2 and\nhidden_states.size()[0]*hidden_states.stride()[0] < hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= 0 and\nhidden_states.size()[1]//512 > 1 and\nEq(hidden_states.size()[1]/hidden_states.size()[1]//512, 512) and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= 0 and\nNe(4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 + 8*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - 8*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 0) and\n4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 + 8*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - 8*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 <= 4*hidden_states.size()[1]*2*768 and\nNe(2*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0] - hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512, 0) and\nNe(2*hidden_states.size()[1]//512 - 1, 1) and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 1) and\n2*hidden_states.size()[1]//512 - 1 >= 2 and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= hidden_states.stride()[0] and\nhidden_states.size()[0]*hidden_states.stride()[0] < (hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512) and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 0) and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= 0 and\n2*hidden_states.size()[1]//512 - 1 > 1 and\n1 < 2*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - (hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 and \nhidden_states.stride()[0] == hidden_states.size()[2])']
to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md.
cuda train hf_Longformer                      PASS
Running torchbench.py hf_Reformer...
cuda train hf_Reformer                        PASS
Running torchbench.py hf_T5...
WARNING:common:fp64 golden ref were not generated for hf_T5
cuda train hf_T5                              PASS
Running torchbench.py hf_T5_base...
WARNING:common:fp64 golden ref were not generated for hf_T5_base
cuda train hf_T5_base                         PASS
Running torchbench.py hf_T5_large...
cuda train hf_T5_large                        PASS
Running torchbench.py lennard_jones...
cuda train lennard_jones                      PASS
Running torchbench.py maml_omniglot...
[2022-11-12 17:07:20,155] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:07:21,153] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train maml_omniglot                      PASS
Running torchbench.py mnasnet1_0...
[2022-11-12 17:07:32,983] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
Exception ignored in: <generator object preorder_traversal._preorder_traversal at 0x7f0821976350>
Traceback (most recent call last):
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/sympy/core/traversal.py", line 121, in _preorder_traversal
    yield node
SystemError: error return without exception set
cuda train mnasnet1_0                         PASS
Running torchbench.py mobilenet_v2...
[2022-11-12 17:07:46,227] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train mobilenet_v2                       PASS
Running torchbench.py mobilenet_v2_quantized_qat...
WARNING:common:fp64 golden ref were not generated for mobilenet_v2_quantized_qat
ERROR:common:'Graph' object has no attribute 'print_readable'
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 660, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 279, in __call__
    raise e
  File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 269, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "<eval_with_key>.8", line 4, in forward
    def forward(self, x : torch.Tensor) -> torch.Tensor:
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
    return compiled_f(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 958, in new_func
    compiled_fn = create_aot_dispatcher_function(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 678, in create_aot_dispatcher_function
    aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 499, in aot_dispatch_autograd
    assert_functional_graph(fx_g.graph)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 401, in assert_functional_graph
    fx_g.print_readable()
AttributeError: 'Graph' object has no attribute 'print_readable'
====== Buggy post-functionalization graph ======
TorchDynamo optimized model failed to run because of following error
cuda train mobilenet_v2_quantized_qat         FAIL
Running torchbench.py mobilenet_v3_large...
[2022-11-12 17:08:36,739] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train mobilenet_v3_large                 PASS
Running torchbench.py moco...
[2022-11-12 17:08:49,627] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
ERROR:common:

from user code:
   File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 172, in concat_all_gather
    torch.distributed.all_gather(tensors_gather, tensor, async_op=False)

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1076, in run_node
    return node.target(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/distributed/distributed_c10d.py", line 1350, in wrapper
    return func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/distributed/distributed_c10d.py", line 2345, in all_gather
    work = default_pg.allgather([tensor_list], [tensor])
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 874, in __torch_dispatch__
    raise RuntimeError(
RuntimeError: c10d.allgather_.default - couldn't find symbolic meta function/decomposition

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
    return wrap_fake_exception(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
    return fn()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
    raise RuntimeError(
RuntimeError: Failed running call_function <function all_gather at 0x7f09d22310d0>(*([FakeTensor(FakeTensor(..., device='meta', size=(s0, s1, s2, s2)), cuda:0)], FakeTensor(FakeTensor(..., device='meta', size=(s0, s1, s2, s2)), cuda:0)), **{'async_op': False}):
c10d.allgather_.default - couldn't find symbolic meta function/decomposition
(scroll up for backtrace)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/parallel/distributed.py", line 1096, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/parallel/distributed.py", line 1049, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])  # type: ignore[index]
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 130, in forward
    self._momentum_update_key_encoder()  # update the key encoder
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 133, in <graph break in forward>
    im_k, idx_unshuffle = self._batch_shuffle_ddp(im_k)
  File "/scratch/ezyang/work/pytorch/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 76, in _batch_shuffle_ddp
    x_gather = concat_all_gather(x)
  File "/scratch/ezyang/work/pytorch/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
    return _compile(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
    super().run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 835, in CALL_FUNCTION_KW
    self.call_function(fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/torch.py", line 403, in call_function
    tensor_variable = wrap_fx_proxy(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
    return wrap_fx_proxy_cls(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
    example_value = get_fake_value(proxy.node, tx)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
    raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:

from user code:
   File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 172, in concat_all_gather
    torch.distributed.all_gather(tensors_gather, tensor, async_op=False)

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

TorchDynamo optimized model failed to run because of following error
cuda train moco                               FAIL
Running torchbench.py nvidia_deeprecommender...
cuda train nvidia_deeprecommender             PASS
Running torchbench.py pytorch_CycleGAN_and_pix2pix...
--dataroot /scratch/ezyang/work/torchbenchmark/torchbenchmark/data/.data/pytorch_CycleGAN_and_pix2pix_inputs/datasets/horse2zebra --name horse2zebra --model cycle_gan --display_id 0 --n_epochs 3 --n_epochs_decay 3 --gpu_ids 0 --checkpoints_dir /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/pytorch_CycleGAN_and_pix2pix/.data/checkpoints
cuda train pytorch_CycleGAN_and_pix2pix       PASS
Running torchbench.py pytorch_stargan...
ERROR:common:'Graph' object has no attribute 'print_readable'
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/pytorch_stargan/model.py", line 55, in forward
    def forward(self, x, c):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
    return compiled_f(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 958, in new_func
    compiled_fn = create_aot_dispatcher_function(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 678, in create_aot_dispatcher_function
    aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 499, in aot_dispatch_autograd
    assert_functional_graph(fx_g.graph)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 401, in assert_functional_graph
    fx_g.print_readable()
AttributeError: 'Graph' object has no attribute 'print_readable'
====== Buggy post-functionalization graph ======
TorchDynamo optimized model failed to run because of following error
cuda train pytorch_stargan                    FAIL
Running torchbench.py pytorch_struct...
cuda train pytorch_struct                     PASS
Running torchbench.py pytorch_unet...
[2022-11-12 17:10:07,552] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:10:13,801] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train pytorch_unet                       PASS
Running torchbench.py resnet18...
[2022-11-12 17:10:23,982] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train resnet18                           PASS
Running torchbench.py resnet50...
[2022-11-12 17:10:35,709] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train resnet50                           PASS
Running torchbench.py resnet50_quantized_qat...
WARNING:common:fp64 golden ref were not generated for resnet50_quantized_qat
ERROR:common:'Graph' object has no attribute 'print_readable'
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 660, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 279, in __call__
    raise e
  File "/scratch/ezyang/work/pytorch/torch/fx/graph_module.py", line 269, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "<eval_with_key>.8", line 4, in forward
    def forward(self, x : torch.Tensor) -> torch.Tensor:
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
    return compiled_f(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 958, in new_func
    compiled_fn = create_aot_dispatcher_function(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 678, in create_aot_dispatcher_function
    aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 499, in aot_dispatch_autograd
    assert_functional_graph(fx_g.graph)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 401, in assert_functional_graph
    fx_g.print_readable()
AttributeError: 'Graph' object has no attribute 'print_readable'
====== Buggy post-functionalization graph ======
TorchDynamo optimized model failed to run because of following error
cuda train resnet50_quantized_qat             FAIL
Running torchbench.py resnext50_32x4d...
[2022-11-12 17:11:20,351] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train resnext50_32x4d                    PASS
Running torchbench.py shufflenet_v2_x1_0...
[2022-11-12 17:11:33,004] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train shufflenet_v2_x1_0                 PASS
/scratch/ezyang/work/env/lib/python3.9/site-packages/gym/core.py:317: DeprecationWarning: [33mWARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.[0m
  deprecation(
/scratch/ezyang/work/env/lib/python3.9/site-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: [33mWARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.[0m
  deprecation(
Running torchbench.py soft_actor_critic...
cuda train soft_actor_critic                  PASS
[2022-11-12 17:11:46,278] torch._dynamo.testing: [WARNING] High loss value alert - 125.05. Can result in unstable gradients.
Running torchbench.py speech_transformer...
[2022-11-12 17:11:46,409] torch._dynamo.testing: [WARNING] High loss value alert - 125.05. Can result in unstable gradients.
[2022-11-12 17:11:46,547] torch._dynamo.testing: [WARNING] High loss value alert - 125.05. Can result in unstable gradients.
[2022-11-12 17:11:46,651] torch._dynamo.testing: [WARNING] High loss value alert - 125.05. Can result in unstable gradients.
[2022-11-12 17:11:55,245] torch._dynamo.variables.builtin: [WARNING] incorrect arg count <bound method BuiltinVariable._call_min_max of BuiltinVariable(max)> missing a required argument: 'b' and no constant handler
[2022-11-12 17:12:25,935] torch._dynamo.testing: [WARNING] High loss value alert - 125.05. Can result in unstable gradients.
cuda train speech_transformer                 PASS
[2022-11-12 17:12:33,559] torch._dynamo.testing: [WARNING] High loss value alert - 3.89. Can result in unstable gradients.
Running torchbench.py squeezenet1_1...
[2022-11-12 17:12:33,597] torch._dynamo.testing: [WARNING] High loss value alert - 3.89. Can result in unstable gradients.
[2022-11-12 17:12:33,625] torch._dynamo.testing: [WARNING] High loss value alert - 3.89. Can result in unstable gradients.
[2022-11-12 17:12:33,648] torch._dynamo.testing: [WARNING] High loss value alert - 3.89. Can result in unstable gradients.
[2022-11-12 17:12:41,718] torch._dynamo.testing: [WARNING] High loss value alert - 3.89. Can result in unstable gradients.
cuda train squeezenet1_1                      PASS
[2022-11-12 17:12:52,452] torch._dynamo.testing: [WARNING] High loss value alert - 86.50. Can result in unstable gradients.
Running torchbench.py tacotron2...
[2022-11-12 17:12:55,845] torch._dynamo.testing: [WARNING] High loss value alert - 86.50. Can result in unstable gradients.
[2022-11-12 17:12:59,278] torch._dynamo.testing: [WARNING] High loss value alert - 86.50. Can result in unstable gradients.
[2022-11-12 17:13:02,749] torch._dynamo.testing: [WARNING] High loss value alert - 86.50. Can result in unstable gradients.
[2022-11-12 17:13:03,565] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:03,786] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,676] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,718] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,759] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,800] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,839] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,877] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:13:04,963] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
ERROR:common:aten._thnn_fused_lstm_cell_backward_impl.default - couldn't find symbolic meta function/decomposition
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 505, in forward
    encoder_outputs = self.encoder(embedded_inputs, text_lengths)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 507, in <graph break in forward>
    mel_outputs, gate_outputs, alignments = self.decoder(
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 396, in forward
    decoder_input = self.get_go_frame(memory).unsqueeze(0)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 402, in <graph break in forward>
    memory, mask=~get_mask_from_lengths(memory_lengths))
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 401, in <graph break in forward>
    self.initialize_decoder_states(
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 407, in <graph break in forward>
    mel_output, gate_output, attention_weights = self.decode(
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py", line 340, in decode
    def decode(self, decoder_input):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
    return compiled_f(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 958, in new_func
    compiled_fn = create_aot_dispatcher_function(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 678, in create_aot_dispatcher_function
    aot_dispatch_autograd(flat_fn, fake_flat_tensor_args, aot_config)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 495, in aot_dispatch_autograd
    fx_g = make_fx(
  File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 665, in wrapped
    t = dispatch_trace(wrap_key(func, args, fx_tracer), tracer=fx_tracer, concrete_args=tuple(phs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 422, in dispatch_trace
    graph = tracer.trace(root, concrete_args)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/fx/_symbolic_trace.py", line 739, in trace
    (self.create_arg(fn(*args)),),
  File "/scratch/ezyang/work/pytorch/torch/fx/_symbolic_trace.py", line 614, in flatten_fn
    tree_out = root_fn(*tree_args)
  File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 439, in wrapped
    out = f(*tensors)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 189, in inner
    outs = f(*f_args, **f_kwargs)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 257, in joint_forward_backward
    backward_out = torch.autograd.grad(
  File "/scratch/ezyang/work/pytorch/torch/autograd/__init__.py", line 300, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 465, in __torch_dispatch__
    return self.inner_torch_dispatch(func, types, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 490, in inner_torch_dispatch
    out = proxy_call(self, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/fx/experimental/proxy_tensor.py", line 323, in proxy_call
    out = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_ops.py", line 284, in __call__
    return self._op(*args, **kwargs or {})
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 874, in __torch_dispatch__
    raise RuntimeError(
RuntimeError: aten._thnn_fused_lstm_cell_backward_impl.default - couldn't find symbolic meta function/decomposition
incomplete graph:
class joint_forward_backward(torch.nn.Module):
    def forward(self, primals, tangents):
        primals_1: f32[4096, 768], [768, 1], primals_2: f32[4096, 1024], [1024, 1], primals_3: f32[4096], [1], primals_4: f32[4096], [1], primals_5: f32[s1, s2], [s2, 1], primals_6: f32[s1, 768 - s2], [768 - s2, 1], primals_7: f32[s1, 1024], [1024, 1], primals_8: f32[s1, 1024], [1024, 1], primals_9: f32[s1, s5], [s5, 1], primals_10: f32[s1, s5], [s5, 1], tangents_1: f32[s1, 1024], [1024, 1], tangents_2: f32[s1, 2, s5], [2*s5, s5, 1], tangents_3: f32[s1, 1024], [1024, 1], = fx_pytree.tree_flatten_spec([primals, tangents], self._in_spec)
        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:352, code: cell_input = torch.cat((decoder_input, self.attention_context), -1)
        cat: f32[s1, 768], [768, 1] = torch.ops.aten.cat.default([primals_5, primals_6], -1);  primals_5 = primals_6 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:353, code: self.attention_hidden, self.attention_cell = self.attention_rnn(
        t: f32[768, 4096], [1, 768] = torch.ops.aten.t.default(primals_1);  primals_1 = None
        mm: f32[s1, 4096], [4096, 1] = torch.ops.aten.mm.default(cat, t);  cat = t = None
        t_1: f32[1024, 4096], [1, 1024] = torch.ops.aten.t.default(primals_2);  primals_2 = None
        mm_1: f32[s1, 4096], [4096, 1] = torch.ops.aten.mm.default(primals_7, t_1);  primals_7 = t_1 = None
        _thnn_fused_lstm_cell = torch.ops.aten._thnn_fused_lstm_cell.default(mm, mm_1, primals_8, primals_3, primals_4);  mm = mm_1 = primals_3 = primals_4 = None
        getitem: f32[s1, 1024], [1024, 1] = _thnn_fused_lstm_cell[0]
        getitem_1: f32[s1, 1024], [1024, 1] = _thnn_fused_lstm_cell[1]
        getitem_2: f32[s1, 4096], [4096, 1] = _thnn_fused_lstm_cell[2];  _thnn_fused_lstm_cell = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:355, code: self.attention_hidden = F.dropout(
        native_dropout = torch.ops.aten.native_dropout.default(getitem, 0.1, True);  getitem = None
        getitem_3: f32[s1, 1024], [1024, 1] = native_dropout[0]
        getitem_4: b8[s1, 1024], [1024, 1] = native_dropout[1];  native_dropout = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:359, code: (self.attention_weights.unsqueeze(1),
        unsqueeze: f32[s1, 1, s5], [s5, 0, 1] = torch.ops.aten.unsqueeze.default(primals_9, 1);  primals_9 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:360, code: self.attention_weights_cum.unsqueeze(1)), dim=1)
        unsqueeze_1: f32[s1, 1, s5], [s5, 0, 1] = torch.ops.aten.unsqueeze.default(primals_10, 1);  primals_10 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:358, code: attention_weights_cat = torch.cat(
        cat_1: f32[s1, 2, s5], [2*s5, s5, 1] = torch.ops.aten.cat.default([unsqueeze, unsqueeze_1], 1);  unsqueeze = unsqueeze_1 = None

        # Gradient addition node due to multiple use of tensor around:, File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:503, code: embedded_inputs = self.embedding(text_inputs).transpose(1, 2)
        is_same_size = torch.ops.aten.is_same_size.default(getitem_3, tangents_1);  getitem_3 = None
        is_same_size_1 = torch.ops.aten.is_same_size.default(getitem_1, tangents_3)

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:355, code: self.attention_hidden = F.dropout(
        native_dropout_backward: f32[s1, 1024], [1024, 1] = torch.ops.aten.native_dropout_backward.default(tangents_1, getitem_4, 1.1111111111111112);  tangents_1 = getitem_4 = None

        # File: /scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tacotron2/model.py:353, code: self.attention_hidden, self.attention_cell = self.attention_rnn(
        _thnn_fused_lstm_cell_backward_impl = torch.ops.aten._thnn_fused_lstm_cell_backward_impl.default(native_dropout_backward, tangents_3, primals_8, getitem_1, getitem_2, True);  native_dropout_backward = tangents_3 = primals_8 = getitem_1 = getitem_2 = None

TorchDynamo optimized model failed to run because of following error
cuda train tacotron2                          FAIL
[2022-11-12 17:13:15,045] torch._dynamo.testing: [WARNING] High loss value alert - 55.32. Can result in unstable gradients.
Running torchbench.py timm_efficientdet...
[2022-11-12 17:13:17,673] torch._dynamo.testing: [WARNING] High loss value alert - 55.43. Can result in unstable gradients.
[2022-11-12 17:13:19,613] torch._dynamo.testing: [WARNING] High loss value alert - 55.32. Can result in unstable gradients.
[2022-11-12 17:13:19,928] torch._dynamo.testing: [WARNING] High loss value alert - 55.32. Can result in unstable gradients.
ERROR:common:

from user code:
   File "/scratch/ezyang/work/env/lib/python3.9/site-packages/effdet/efficientdet.py", line 211, in forward
    input_node = resample(input_node)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/effdet/efficientdet.py", line 134, in forward
    return F.interpolate(

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1076, in run_node
    return node.target(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/functional.py", line 3928, in interpolate
    return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
    return wrap_fake_exception(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
    return fn()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
    raise RuntimeError(
RuntimeError: Failed running call_function <function interpolate at 0x7fbd78ae9040>(*(FakeTensor(FakeTensor(..., device='meta',
           size=(s0, 88, ceiling(ceiling(ceiling(ceiling(ceiling(ceiling(ceiling(s2/2)/2)/2)/2)/2)/2)/2), ceiling(ceiling(ceiling(ceiling(ceiling(ceiling(ceiling(s2/2)/2)/2)/2)/2)/2)/2)),
           grad_fn=<MaxPool2DWithIndicesBackward0>), cuda:0), (10, 10), None, 'nearest', None), **{'recompute_scale_factor': False}):
Cannot call sizes() on tensor with symbolic sizes/strides
(scroll up for backtrace)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
    return _compile(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
    super().run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 183, in call_function
    tx.call_function(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 835, in CALL_FUNCTION_KW
    self.call_function(fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/torch.py", line 403, in call_function
    tensor_variable = wrap_fx_proxy(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
    return wrap_fx_proxy_cls(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
    example_value = get_fake_value(proxy.node, tx)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
    raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:

from user code:
   File "/scratch/ezyang/work/env/lib/python3.9/site-packages/effdet/efficientdet.py", line 211, in forward
    input_node = resample(input_node)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/effdet/efficientdet.py", line 134, in forward
    return F.interpolate(

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

TorchDynamo optimized model failed to run because of following error
cuda train timm_efficientdet                  FAIL
Running torchbench.py timm_efficientnet...
cuda train timm_efficientnet                  PASS
Running torchbench.py timm_regnet...
cuda train timm_regnet                        PASS
Running torchbench.py timm_resnest...
[2022-11-12 17:15:15,796] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train timm_resnest                       PASS
Running torchbench.py timm_vision_transformer...
cuda train timm_vision_transformer            PASS
Running torchbench.py timm_vision_transformer_large...
cuda train timm_vision_transformer_large      PASS
Running torchbench.py timm_vovnet...
cuda train timm_vovnet                        PASS
Running torchbench.py tts_angular...
ERROR:common:

from user code:
   File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tts_angular/model.py", line 18, in <graph break in forward>
    o, (_, _) = self.lstm(x)

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1081, in run_node
    return nnmodule(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/rnn.py", line 776, in forward
    result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/scratch/ezyang/work/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
    return wrap_fake_exception(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
    return fn()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
    raise RuntimeError(
RuntimeError: Failed running call_module self_lstm(*(FakeTensor(FakeTensor(..., device='meta', size=(s0, s1, 40)), cuda:0),), **{}):
!has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/scratch/ezyang/work/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch.
(scroll up for backtrace)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tts_angular/model.py", line 59, in forward
    d = self.layers(x)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tts_angular/model.py", line 17, in forward
    self.lstm.flatten_parameters()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
    return _compile(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
    super().run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 202, in call_function
    return wrap_fx_proxy(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
    return wrap_fx_proxy_cls(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
    example_value = get_fake_value(proxy.node, tx)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
    raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:

from user code:
   File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/tts_angular/model.py", line 18, in <graph break in forward>
    o, (_, _) = self.lstm(x)

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

TorchDynamo optimized model failed to run because of following error
cuda train tts_angular                        FAIL
Running torchbench.py vgg16...
cuda train vgg16                              PASS
[2022-11-12 17:17:14,994] torch._dynamo.testing: [WARNING] High loss value alert - 66.78. Can result in unstable gradients.
Running torchbench.py vision_maskrcnn...
[2022-11-12 17:17:18,626] torch._dynamo.testing: [WARNING] High loss value alert - 66.78. Can result in unstable gradients.
[2022-11-12 17:17:18,941] torch._dynamo.testing: [WARNING] High loss value alert - 66.78. Can result in unstable gradients.
[2022-11-12 17:17:19,234] torch._dynamo.testing: [WARNING] High loss value alert - 66.78. Can result in unstable gradients.
ERROR:common:compile_fn raised AssertionError: While executing %mul : [#users=1] = call_method[target=__mul__](args = (%ceil, 32.0), kwargs = {})

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 449, in call_user_compiler
    compiled_fn = self.compiler_fn(gm, self.example_inputs())
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/debug_utils.py", line 803, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 94, in compile_fn
    return cls(gm, example_inputs).verified_candidate()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 116, in __init__
    if not is_aot_autograd_safe_to_run(gm, example_inputs):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 64, in is_aot_autograd_safe_to_run
    mutated = has_mutation(gm, example_inputs, inputs_only=True)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 160, in has_mutation
    ShapeAliasingAndMutationProp(new_gm).run(*example_inputs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 124, in run
    super().run(*args)
  File "/scratch/ezyang/work/pytorch/torch/fx/interpreter.py", line 130, in run
    self.env[node] = self.run_node(node)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 59, in run_node
    assert result is not NotImplemented
AssertionError: While executing %mul : [#users=1] = call_method[target=__mul__](args = (%ceil, 32.0), kwargs = {})

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/torchvision/torchvision/models/detection/generalized_rcnn.py", line 83, in forward
    images, targets = self.transform(images, targets)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/torchvision/torchvision/models/detection/transform.py", line 130, in forward
    image, target_index = self.resize(image, target_index)
  File "/scratch/ezyang/work/torchvision/torchvision/models/detection/transform.py", line 130, in <graph break in forward>
    image, target_index = self.resize(image, target_index)
  File "/scratch/ezyang/work/torchvision/torchvision/models/detection/transform.py", line 136, in <graph break in forward>
    images = self.batch_images(images, size_divisible=self.size_divisible)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
    return _compile(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
    super().run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 348, in step
    self.output.compile_subgraph(self, partial_convert=True)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 347, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 417, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 454, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised AssertionError: While executing %mul : [#users=1] = call_method[target=__mul__](args = (%ceil, 32.0), kwargs = {})

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

TorchDynamo optimized model failed to run because of following error
cuda train vision_maskrcnn                    FAIL
Running torchbench.py yolov3...
[2022-11-12 17:17:44,829] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
ERROR:common:

from user code:
   File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/yolov3/yolo_models.py", line 292, in <graph break in forward_once>
    x = module(x)

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1081, in run_node
    return nnmodule(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/upsampling.py", line 156, in forward
    return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,
  File "/scratch/ezyang/work/pytorch/torch/nn/functional.py", line 3928, in interpolate
    return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
    return wrap_fake_exception(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
    return fn()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
    raise RuntimeError(
RuntimeError: Failed running call_module sub___tuple_iterator_getitem____stack0__2__1(*(FakeTensor(FakeTensor(..., device='meta', size=(s0, 256, s2, s3), grad_fn=<LeakyReluBackward1>), cuda:0),), **{}):
Cannot call sizes() on tensor with symbolic sizes/strides
(scroll up for backtrace)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 332, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/torchbench.py", line 335, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/yolov3/yolo_models.py", line 238, in forward
    return self.forward_once(x)
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/yolov3/yolo_models.py", line 290, in forward_once
    yolo_out.append(module(x, out))
  File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/yolov3/yolo_models.py", line 290, in <graph break in forward_once>
    yolo_out.append(module(x, out))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
    return _compile(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
    super().run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 202, in call_function
    return wrap_fx_proxy(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
    return wrap_fx_proxy_cls(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
    example_value = get_fake_value(proxy.node, tx)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
    raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:

from user code:
   File "/scratch/ezyang/work/torchbenchmark/torchbenchmark/models/yolov3/yolo_models.py", line 292, in <graph break in forward_once>
    x = module(x)

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

TorchDynamo optimized model failed to run because of following error
cuda train yolov3                             FAIL
[2022-11-12 17:18:00,878] torch._dynamo.testing: [WARNING] High loss value alert - 10.33. Can result in unstable gradients.
Running huggingface.py AlbertForMaskedLM...
[2022-11-12 17:18:01,927] torch._dynamo.testing: [WARNING] High loss value alert - 10.33. Can result in unstable gradients.
[2022-11-12 17:18:02,176] torch._dynamo.testing: [WARNING] High loss value alert - 10.33. Can result in unstable gradients.
[2022-11-12 17:18:02,423] torch._dynamo.testing: [WARNING] High loss value alert - 10.33. Can result in unstable gradients.
[2022-11-12 17:18:28,749] torch._dynamo.testing: [WARNING] High loss value alert - 10.33. Can result in unstable gradients.
cuda train AlbertForMaskedLM                  PASS
[2022-11-12 17:18:39,843] torch._dynamo.testing: [WARNING] High loss value alert - 6.24. Can result in unstable gradients.
Running huggingface.py AlbertForQuestionAnswering...
[2022-11-12 17:18:40,892] torch._dynamo.testing: [WARNING] High loss value alert - 6.24. Can result in unstable gradients.
[2022-11-12 17:18:41,140] torch._dynamo.testing: [WARNING] High loss value alert - 6.24. Can result in unstable gradients.
[2022-11-12 17:18:41,386] torch._dynamo.testing: [WARNING] High loss value alert - 6.24. Can result in unstable gradients.
[2022-11-12 17:19:08,813] torch._dynamo.testing: [WARNING] High loss value alert - 6.24. Can result in unstable gradients.
cuda train AlbertForQuestionAnswering         PASS
[2022-11-12 17:19:19,168] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
Running huggingface.py AllenaiLongformerBase...
[2022-11-12 17:19:19,506] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:19:19,692] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:19:19,866] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:19:25,056] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:19:26,094] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:20:32,712] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64)
   function: '_chunk' (/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/longformer/modeling_longformer.py:770)
   reasons:  ['(Eq(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0], 512*hidden_states.size()[0]*hidden_states.stride()[0]*hidden_states.size()[1]//512) and\nEq(Mod(hidden_states.size()[1], hidden_states.size()[1]//512), 0) and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= 0 and\nNe(hidden_states.size()[1]//512, 0) and\n4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0] <= 4*hidden_states.size()[1]*768 and\nNe(hidden_states.size()[1]/hidden_states.size()[1]//512, 1) and\nNe(hidden_states.size()[1]//512, 1) and\nNe(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512, 1) and\nhidden_states.size()[1]//512 >= 2 and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= hidden_states.stride()[0] and\nhidden_states.size()[1]/hidden_states.size()[1]//512 >= 2 and\nhidden_states.size()[0]*hidden_states.stride()[0] < hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 and\nhidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 >= 0 and\nhidden_states.size()[1]//512 > 1 and\nEq(hidden_states.size()[1]/hidden_states.size()[1]//512, 512) and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= 0 and\nNe(4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 + 8*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - 8*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 0) and\n4*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512 + 8*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - 8*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 <= 4*hidden_states.size()[1]*768 and\nNe(2*hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0] - hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512, 0) and\nNe(2*hidden_states.size()[1]//512 - 1, 1) and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 1) and\n2*hidden_states.size()[1]//512 - 1 >= 2 and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= hidden_states.stride()[0] and\nhidden_states.size()[0]*hidden_states.stride()[0] < (hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512) and\nNe((hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2, 0) and\n(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 >= 0 and\n2*hidden_states.size()[1]//512 - 1 > 1 and\n1 < 2*hidden_states.size()[1]//512*(hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 - (hidden_states.size()[1]*hidden_states.size()[0]*hidden_states.stride()[0]/hidden_states.size()[1]//512)//2 and \nhidden_states.stride()[0] == hidden_states.size()[2])']
to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md.
[2022-11-12 17:20:43,723] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
cuda train AllenaiLongformerBase              PASS
[2022-11-12 17:20:57,237] torch._dynamo.testing: [WARNING] High loss value alert - 11.05. Can result in unstable gradients.
Running huggingface.py BartForCausalLM...
[2022-11-12 17:20:57,522] torch._dynamo.testing: [WARNING] High loss value alert - 11.05. Can result in unstable gradients.
[2022-11-12 17:20:57,638] torch._dynamo.testing: [WARNING] High loss value alert - 11.05. Can result in unstable gradients.
[2022-11-12 17:20:57,736] torch._dynamo.testing: [WARNING] High loss value alert - 11.05. Can result in unstable gradients.
[2022-11-12 17:21:23,498] torch._dynamo.testing: [WARNING] High loss value alert - 11.05. Can result in unstable gradients.
cuda train BartForCausalLM                    PASS
[2022-11-12 17:21:42,870] torch._dynamo.testing: [WARNING] High loss value alert - 11.01. Can result in unstable gradients.
Running huggingface.py BartForConditionalGeneration...
[2022-11-12 17:21:43,428] torch._dynamo.testing: [WARNING] High loss value alert - 11.01. Can result in unstable gradients.
[2022-11-12 17:21:43,657] torch._dynamo.testing: [WARNING] High loss value alert - 11.01. Can result in unstable gradients.
[2022-11-12 17:21:43,864] torch._dynamo.testing: [WARNING] High loss value alert - 11.01. Can result in unstable gradients.
[2022-11-12 17:22:45,976] torch._dynamo.testing: [WARNING] High loss value alert - 11.01. Can result in unstable gradients.
cuda train BartForConditionalGeneration       PASS
[2022-11-12 17:22:56,373] torch._dynamo.testing: [WARNING] High loss value alert - 10.40. Can result in unstable gradients.
Running huggingface.py BertForMaskedLM...
[2022-11-12 17:22:56,456] torch._dynamo.testing: [WARNING] High loss value alert - 10.40. Can result in unstable gradients.
[2022-11-12 17:22:56,541] torch._dynamo.testing: [WARNING] High loss value alert - 10.40. Can result in unstable gradients.
[2022-11-12 17:22:56,618] torch._dynamo.testing: [WARNING] High loss value alert - 10.40. Can result in unstable gradients.
[2022-11-12 17:23:21,978] torch._dynamo.testing: [WARNING] High loss value alert - 10.40. Can result in unstable gradients.
cuda train BertForMaskedLM                    PASS
[2022-11-12 17:23:30,808] torch._dynamo.testing: [WARNING] High loss value alert - 4.95. Can result in unstable gradients.
Running huggingface.py BertForQuestionAnswering...
[2022-11-12 17:23:30,886] torch._dynamo.testing: [WARNING] High loss value alert - 4.95. Can result in unstable gradients.
[2022-11-12 17:23:30,967] torch._dynamo.testing: [WARNING] High loss value alert - 4.95. Can result in unstable gradients.
[2022-11-12 17:23:31,034] torch._dynamo.testing: [WARNING] High loss value alert - 4.95. Can result in unstable gradients.
[2022-11-12 17:23:54,967] torch._dynamo.testing: [WARNING] High loss value alert - 4.95. Can result in unstable gradients.
cuda train BertForQuestionAnswering           PASS
[2022-11-12 17:24:04,461] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
Running huggingface.py BigBird...
[2022-11-12 17:24:04,897] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:24:05,226] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:24:05,543] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
ERROR:common:output 0: torch.Size([1, 1, 12, 64, 192]) != torch.Size([1, 12, 64, 192])

While executing %mul : [#users=2] = call_function[target=torch.ops.aten.mul.Tensor](args = (%permute, %permute_1), kwargs = {})
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/huggingface.py", line 426, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/huggingface.py", line 429, in <graph break in forward_and_backward_pass>
    pred = mod(**cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2462, in forward
    outputs = self.bert(
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2104, in forward
    blocked_encoder_mask, band_mask, from_mask, to_mask = self.create_masks_for_block_sparse_attn(
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/big_bird/modeling_big_bird.py", line 2185, in create_masks_for_block_sparse_attn
    @staticmethod
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 972, in forward
    return compiled_f(
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 963, in new_func
    return compiled_fn(args)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 390, in new_fn
    fw_outs = call_func_with_args(compiled_fw, args, disable_amp=disable_amp)
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 346, in call_func_with_args
    out = normalize_as_list(f(args))
  File "/scratch/ezyang/work/pytorch/functorch/_src/aot_autograd.py", line 321, in g
    return f(*args)
  File "/scratch/ezyang/work/pytorch/torch/fx/interpreter.py", line 130, in run
    self.env[node] = self.run_node(node)
  File "/scratch/ezyang/work/pytorch/functorch/_src/compilers.py", line 162, in run_node
    assert nv.size() == rv.size(), f"output {i}: {nv.size()} != {rv.size()}"
AssertionError: output 0: torch.Size([1, 1, 12, 64, 192]) != torch.Size([1, 12, 64, 192])

While executing %mul : [#users=2] = call_function[target=torch.ops.aten.mul.Tensor](args = (%permute, %permute_1), kwargs = {})
TorchDynamo optimized model failed to run because of following error
cuda train BigBird                            FAIL
[2022-11-12 17:24:13,961] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
Running huggingface.py BlenderbotSmallForCausalLM...
[2022-11-12 17:24:14,009] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:24:14,068] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:24:14,125] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
[2022-11-12 17:24:32,378] torch._dynamo.testing: [WARNING] High loss value alert - 10.97. Can result in unstable gradients.
cuda train BlenderbotSmallForCausalLM         PASS
[2022-11-12 17:24:41,956] torch._dynamo.testing: [WARNING] High loss value alert - 10.94. Can result in unstable gradients.
Running huggingface.py BlenderbotSmallForConditionalGeneration...
[2022-11-12 17:24:42,054] torch._dynamo.testing: [WARNING] High loss value alert - 10.94. Can result in unstable gradients.
[2022-11-12 17:24:42,188] torch._dynamo.testing: [WARNING] High loss value alert - 10.94. Can result in unstable gradients.
[2022-11-12 17:24:42,307] torch._dynamo.testing: [WARNING] High loss value alert - 10.94. Can result in unstable gradients.
[2022-11-12 17:25:27,138] torch._dynamo.testing: [WARNING] High loss value alert - 10.94. Can result in unstable gradients.
cuda train BlenderbotSmallForConditionalGeneration PASS
[2022-11-12 17:25:36,934] torch._dynamo.testing: [WARNING] High loss value alert - 10.49. Can result in unstable gradients.
Running huggingface.py CamemBert...
[2022-11-12 17:25:37,042] torch._dynamo.testing: [WARNING] High loss value alert - 10.49. Can result in unstable gradients.
[2022-11-12 17:25:37,114] torch._dynamo.testing: [WARNING] High loss value alert - 10.49. Can result in unstable gradients.
[2022-11-12 17:25:37,175] torch._dynamo.testing: [WARNING] High loss value alert - 10.49. Can result in unstable gradients.
[2022-11-12 17:26:01,889] torch._dynamo.testing: [WARNING] High loss value alert - 10.49. Can result in unstable gradients.
cuda train CamemBert                          PASS
[2022-11-12 17:26:11,634] torch._dynamo.testing: [WARNING] High loss value alert - 10.99. Can result in unstable gradients.
Running huggingface.py DebertaForMaskedLM...
[2022-11-12 17:26:11,759] torch._dynamo.testing: [WARNING] High loss value alert - 10.99. Can result in unstable gradients.
[2022-11-12 17:26:11,847] torch._dynamo.testing: [WARNING] High loss value alert - 10.99. Can result in unstable gradients.
[2022-11-12 17:26:11,925] torch._dynamo.testing: [WARNING] High loss value alert - 10.99. Can result in unstable gradients.
[2022-11-12 17:26:55,125] torch._dynamo.testing: [WARNING] High loss value alert - 10.99. Can result in unstable gradients.
cuda train DebertaForMaskedLM                 PASS
[2022-11-12 17:27:04,722] torch._dynamo.testing: [WARNING] High loss value alert - 6.08. Can result in unstable gradients.
Running huggingface.py DebertaForQuestionAnswering...
[2022-11-12 17:27:04,828] torch._dynamo.testing: [WARNING] High loss value alert - 6.08. Can result in unstable gradients.
[2022-11-12 17:27:04,922] torch._dynamo.testing: [WARNING] High loss value alert - 6.08. Can result in unstable gradients.
[2022-11-12 17:27:05,009] torch._dynamo.testing: [WARNING] High loss value alert - 6.08. Can result in unstable gradients.
[2022-11-12 17:27:48,482] torch._dynamo.testing: [WARNING] High loss value alert - 6.08. Can result in unstable gradients.
cuda train DebertaForQuestionAnswering        PASS
WARNING:__main__:Sequence Length not defined for DistilBertForMaskedLM. Choosing 128 arbitrarily
[2022-11-12 17:27:57,096] torch._dynamo.testing: [WARNING] High loss value alert - 10.39. Can result in unstable gradients.
Running huggingface.py DistilBertForMaskedLM...
[2022-11-12 17:27:57,137] torch._dynamo.testing: [WARNING] High loss value alert - 10.39. Can result in unstable gradients.
[2022-11-12 17:27:57,190] torch._dynamo.testing: [WARNING] High loss value alert - 10.39. Can result in unstable gradients.
[2022-11-12 17:27:57,224] torch._dynamo.testing: [WARNING] High loss value alert - 10.39. Can result in unstable gradients.
[2022-11-12 17:28:09,916] torch._dynamo.testing: [WARNING] High loss value alert - 10.39. Can result in unstable gradients.
cuda train DistilBertForMaskedLM              PASS
WARNING:__main__:Sequence Length not defined for DistilBertForQuestionAnswering. Choosing 128 arbitrarily
[2022-11-12 17:28:17,452] torch._dynamo.testing: [WARNING] High loss value alert - 5.02. Can result in unstable gradients.
Running huggingface.py DistilBertForQuestionAnswering...
[2022-11-12 17:28:17,496] torch._dynamo.testing: [WARNING] High loss value alert - 5.02. Can result in unstable gradients.
[2022-11-12 17:28:17,539] torch._dynamo.testing: [WARNING] High loss value alert - 5.02. Can result in unstable gradients.
[2022-11-12 17:28:17,574] torch._dynamo.testing: [WARNING] High loss value alert - 5.02. Can result in unstable gradients.
[2022-11-12 17:28:30,345] torch._dynamo.testing: [WARNING] High loss value alert - 5.02. Can result in unstable gradients.
cuda train DistilBertForQuestionAnswering     PASS
[2022-11-12 17:28:39,659] torch._dynamo.testing: [WARNING] High loss value alert - 11.02. Can result in unstable gradients.
Running huggingface.py DistillGPT2...
[2022-11-12 17:28:39,735] torch._dynamo.testing: [WARNING] High loss value alert - 11.02. Can result in unstable gradients.
[2022-11-12 17:28:39,777] torch._dynamo.testing: [WARNING] High loss value alert - 11.02. Can result in unstable gradients.
[2022-11-12 17:28:39,809] torch._dynamo.testing: [WARNING] High loss value alert - 11.02. Can result in unstable gradients.
[2022-11-12 17:28:58,044] torch._dynamo.testing: [WARNING] High loss value alert - 11.02. Can result in unstable gradients.
cuda train DistillGPT2                        PASS
If you want to use `ElectraForCausalLM` as a standalone, add `is_decoder=True.`
[2022-11-12 17:29:04,959] torch._dynamo.testing: [WARNING] High loss value alert - 10.35. Can result in unstable gradients.
Running huggingface.py ElectraForCausalLM...
[2022-11-12 17:29:05,028] torch._dynamo.testing: [WARNING] High loss value alert - 10.35. Can result in unstable gradients.
[2022-11-12 17:29:05,110] torch._dynamo.testing: [WARNING] High loss value alert - 10.35. Can result in unstable gradients.
[2022-11-12 17:29:05,177] torch._dynamo.testing: [WARNING] High loss value alert - 10.35. Can result in unstable gradients.
[2022-11-12 17:29:29,425] torch._dynamo.testing: [WARNING] High loss value alert - 10.35. Can result in unstable gradients.
cuda train ElectraForCausalLM                 PASS
[2022-11-12 17:29:36,485] torch._dynamo.testing: [WARNING] High loss value alert - 6.72. Can result in unstable gradients.
Running huggingface.py ElectraForQuestionAnswering...
[2022-11-12 17:29:36,561] torch._dynamo.testing: [WARNING] High loss value alert - 6.72. Can result in unstable gradients.
[2022-11-12 17:29:36,650] torch._dynamo.testing: [WARNING] High loss value alert - 6.72. Can result in unstable gradients.
[2022-11-12 17:29:36,722] torch._dynamo.testing: [WARNING] High loss value alert - 6.72. Can result in unstable gradients.
[2022-11-12 17:30:01,899] torch._dynamo.testing: [WARNING] High loss value alert - 6.72. Can result in unstable gradients.
cuda train ElectraForQuestionAnswering        PASS
Running huggingface.py GPT2ForSequenceClassification...
cuda train GPT2ForSequenceClassification      PASS
[2022-11-12 17:30:58,390] torch._dynamo.testing: [WARNING] High loss value alert - 10.56. Can result in unstable gradients.
Running huggingface.py GoogleFnet...
[2022-11-12 17:30:58,481] torch._dynamo.testing: [WARNING] High loss value alert - 10.56. Can result in unstable gradients.
[2022-11-12 17:30:58,530] torch._dynamo.testing: [WARNING] High loss value alert - 10.56. Can result in unstable gradients.
[2022-11-12 17:30:58,583] torch._dynamo.testing: [WARNING] High loss value alert - 10.56. Can result in unstable gradients.
ERROR:common:
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 379, in transform
    tracer = InstructionTranslator(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1445, in __init__
    self.symbolic_locals = collections.OrderedDict(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1446, in <genexpr>
    (k, VariableBuilder(self, LocalSource(k))(f_locals[k]))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 141, in __call__
    return self._wrap(value).clone(**self.options())
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 215, in _wrap
    output = [
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 216, in <listcomp>
    VariableBuilder(self.tx, GetItemSource(self.get_source(), i))(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 141, in __call__
    return self._wrap(value).clone(**self.options())
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 205, in _wrap
    return self.wrap_tensor(value)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 551, in wrap_tensor
    tensor_variable = wrap_fx_proxy(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
    return wrap_fx_proxy_cls(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 679, in wrap_fx_proxy_cls
    example_value = fake_wrapper(example_value)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 745, in wrap_to_fake_tensor_and_record
    return wrap_fake_exception(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
    return fn()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 746, in <lambda>
    lambda: make_fake_tensor(e, tx.fake_mode, static_shapes, tx)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 685, in make_fake_tensor
    fake_tensor = fake_mode.from_tensor(e, static_shapes=static_shapes)
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 1015, in from_tensor
    return self.fake_tensor_converter(self, tensor, shape_env=self.shape_env)
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 266, in __call__
    return self.from_real_tensor(fake_mode, t, make_constant, shape_env=shape_env)
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 238, in from_real_tensor
    out = self.meta_converter(t, shape_env=shape_env, callback=mk_fake_tensor)
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/meta_utils.py", line 458, in __call__
    r = self.meta_tensor(t, shape_env=shape_env, callback=callback)
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/meta_utils.py", line 279, in meta_tensor
    base = torch.view_as_real(base)
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 600, in __torch_dispatch__
    return func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_ops.py", line 284, in __call__
    return self._op(*args, **kwargs or {})
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 874, in __torch_dispatch__
    raise RuntimeError(
RuntimeError: aten.view_as_real.default - couldn't find symbolic meta function/decomposition

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/huggingface.py", line 426, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/huggingface.py", line 429, in <graph break in forward_and_backward_pass>
    pred = mod(**cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/fnet/modeling_fnet.py", line 763, in forward
    outputs = self.fnet(
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/fnet/modeling_fnet.py", line 604, in forward
    encoder_outputs = self.encoder(
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/fnet/modeling_fnet.py", line 308, in forward
    layer_outputs = layer_module(hidden_states)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/fnet/modeling_fnet.py", line 267, in forward
    self_fourier_outputs = self.fourier(hidden_states)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/transformers/models/fnet/modeling_fnet.py", line 220, in forward
    self_outputs = self.self(hidden_states)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
    return _compile(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 464, in _compile
    raise InternalTorchDynamoError() from e
torch._dynamo.exc.InternalTorchDynamoError
TorchDynamo optimized model failed to run because of following error
cuda train GoogleFnet                         FAIL
[2022-11-12 17:31:07,860] torch._dynamo.testing: [WARNING] High loss value alert - 10.46. Can result in unstable gradients.
Running huggingface.py LayoutLMForMaskedLM...
[2022-11-12 17:31:07,974] torch._dynamo.testing: [WARNING] High loss value alert - 10.46. Can result in unstable gradients.
[2022-11-12 17:31:08,064] torch._dynamo.testing: [WARNING] High loss value alert - 10.46. Can result in unstable gradients.
[2022-11-12 17:31:08,143] torch._dynamo.testing: [WARNING] High loss value alert - 10.46. Can result in unstable gradients.
[2022-11-12 17:31:33,849] torch._dynamo.testing: [WARNING] High loss value alert - 10.46. Can result in unstable gradients.
cuda train LayoutLMForMaskedLM                PASS
Running huggingface.py LayoutLMForSequenceClassification...
cuda train LayoutLMForSequenceClassification  PASS
WARNING:__main__:Sequence Length not defined for M2M100ForConditionalGeneration. Choosing 128 arbitrarily
[2022-11-12 17:32:30,023] torch._dynamo.testing: [WARNING] High loss value alert - 12.05. Can result in unstable gradients.
Running huggingface.py M2M100ForConditionalGeneration...
[2022-11-12 17:32:30,221] torch._dynamo.testing: [WARNING] High loss value alert - 12.05. Can result in unstable gradients.
[2022-11-12 17:32:30,418] torch._dynamo.testing: [WARNING] High loss value alert - 12.05. Can result in unstable gradients.
[2022-11-12 17:32:30,592] torch._dynamo.testing: [WARNING] High loss value alert - 12.05. Can result in unstable gradients.
[2022-11-12 17:33:33,382] torch._dynamo.testing: [WARNING] High loss value alert - 12.05. Can result in unstable gradients.
cuda train M2M100ForConditionalGeneration     PASS
WARNING:__main__:Sequence Length not defined for MBartForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:33:46,613] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
Running huggingface.py MBartForCausalLM...
[2022-11-12 17:33:46,703] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:33:46,803] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:33:46,889] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:34:11,524] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
cuda train MBartForCausalLM                   PASS
WARNING:__main__:Sequence Length not defined for MBartForConditionalGeneration. Choosing 128 arbitrarily
[2022-11-12 17:34:29,763] torch._dynamo.testing: [WARNING] High loss value alert - 11.03. Can result in unstable gradients.
Running huggingface.py MBartForConditionalGeneration...
[2022-11-12 17:34:29,956] torch._dynamo.testing: [WARNING] High loss value alert - 11.03. Can result in unstable gradients.
[2022-11-12 17:34:30,171] torch._dynamo.testing: [WARNING] High loss value alert - 11.03. Can result in unstable gradients.
[2022-11-12 17:34:30,358] torch._dynamo.testing: [WARNING] High loss value alert - 11.03. Can result in unstable gradients.
[2022-11-12 17:35:32,182] torch._dynamo.testing: [WARNING] High loss value alert - 11.03. Can result in unstable gradients.
cuda train MBartForConditionalGeneration      PASS
WARNING:__main__:Sequence Length not defined for MT5ForConditionalGeneration. Choosing 128 arbitrarily
[2022-11-12 17:35:44,230] torch._dynamo.testing: [WARNING] High loss value alert - 103.31. Can result in unstable gradients.
Running huggingface.py MT5ForConditionalGeneration...
WARNING:common:fp64 golden ref were not generated for MT5ForConditionalGeneration
[2022-11-12 17:35:44,450] torch._dynamo.testing: [WARNING] High loss value alert - 103.31. Can result in unstable gradients.
[2022-11-12 17:35:44,639] torch._dynamo.testing: [WARNING] High loss value alert - 103.31. Can result in unstable gradients.
[2022-11-12 17:36:33,702] torch._dynamo.testing: [WARNING] High loss value alert - 103.31. Can result in unstable gradients.
cuda train MT5ForConditionalGeneration        PASS
If you want to use `MegatronBertForCausalLM` as a standalone, add `is_decoder=True.`
WARNING:__main__:Sequence Length not defined for MegatronBertForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:36:47,781] torch._dynamo.testing: [WARNING] High loss value alert - 10.54. Can result in unstable gradients.
Running huggingface.py MegatronBertForCausalLM...
[2022-11-12 17:36:47,939] torch._dynamo.testing: [WARNING] High loss value alert - 10.54. Can result in unstable gradients.
[2022-11-12 17:36:48,108] torch._dynamo.testing: [WARNING] High loss value alert - 10.54. Can result in unstable gradients.
[2022-11-12 17:36:48,251] torch._dynamo.testing: [WARNING] High loss value alert - 10.54. Can result in unstable gradients.
[2022-11-12 17:37:41,030] torch._dynamo.testing: [WARNING] High loss value alert - 10.54. Can result in unstable gradients.
cuda train MegatronBertForCausalLM            PASS
WARNING:__main__:Sequence Length not defined for MegatronBertForQuestionAnswering. Choosing 128 arbitrarily
[2022-11-12 17:37:54,825] torch._dynamo.testing: [WARNING] High loss value alert - 4.80. Can result in unstable gradients.
Running huggingface.py MegatronBertForQuestionAnswering...
[2022-11-12 17:37:54,990] torch._dynamo.testing: [WARNING] High loss value alert - 4.80. Can result in unstable gradients.
[2022-11-12 17:37:55,179] torch._dynamo.testing: [WARNING] High loss value alert - 4.80. Can result in unstable gradients.
[2022-11-12 17:37:55,341] torch._dynamo.testing: [WARNING] High loss value alert - 4.80. Can result in unstable gradients.
[2022-11-12 17:38:48,532] torch._dynamo.testing: [WARNING] High loss value alert - 4.80. Can result in unstable gradients.
cuda train MegatronBertForQuestionAnswering   PASS
WARNING:__main__:Sequence Length not defined for MobileBertForMaskedLM. Choosing 128 arbitrarily
[2022-11-12 17:38:56,956] torch._dynamo.testing: [WARNING] High loss value alert - 10.42. Can result in unstable gradients.
Running huggingface.py MobileBertForMaskedLM...
[2022-11-12 17:38:57,284] torch._dynamo.testing: [WARNING] High loss value alert - 10.42. Can result in unstable gradients.
[2022-11-12 17:38:57,804] torch._dynamo.testing: [WARNING] High loss value alert - 10.42. Can result in unstable gradients.
[2022-11-12 17:38:58,107] torch._dynamo.testing: [WARNING] High loss value alert - 10.42. Can result in unstable gradients.
[2022-11-12 17:40:24,812] torch._dynamo.testing: [WARNING] High loss value alert - 10.42. Can result in unstable gradients.
cuda train MobileBertForMaskedLM              PASS
WARNING:__main__:Sequence Length not defined for MobileBertForQuestionAnswering. Choosing 128 arbitrarily
[2022-11-12 17:40:34,101] torch._dynamo.testing: [WARNING] High loss value alert - 4.84. Can result in unstable gradients.
Running huggingface.py MobileBertForQuestionAnswering...
[2022-11-12 17:40:34,424] torch._dynamo.testing: [WARNING] High loss value alert - 4.84. Can result in unstable gradients.
[2022-11-12 17:40:34,949] torch._dynamo.testing: [WARNING] High loss value alert - 4.84. Can result in unstable gradients.
[2022-11-12 17:40:35,249] torch._dynamo.testing: [WARNING] High loss value alert - 4.84. Can result in unstable gradients.
[2022-11-12 17:42:02,202] torch._dynamo.testing: [WARNING] High loss value alert - 4.84. Can result in unstable gradients.
cuda train MobileBertForQuestionAnswering     PASS
WARNING:__main__:Sequence Length not defined for OPTForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:42:14,439] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
Running huggingface.py OPTForCausalLM...
[2022-11-12 17:42:14,514] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:42:14,592] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:42:14,659] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
[2022-11-12 17:42:38,881] torch._dynamo.testing: [WARNING] High loss value alert - 11.00. Can result in unstable gradients.
cuda train OPTForCausalLM                     PASS
WARNING:__main__:Sequence Length not defined for PLBartForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:42:47,768] torch._dynamo.testing: [WARNING] High loss value alert - 11.40. Can result in unstable gradients.
Running huggingface.py PLBartForCausalLM...
[2022-11-12 17:42:47,834] torch._dynamo.testing: [WARNING] High loss value alert - 11.40. Can result in unstable gradients.
[2022-11-12 17:42:47,881] torch._dynamo.testing: [WARNING] High loss value alert - 11.40. Can result in unstable gradients.
[2022-11-12 17:42:47,919] torch._dynamo.testing: [WARNING] High loss value alert - 11.40. Can result in unstable gradients.
[2022-11-12 17:43:02,226] torch._dynamo.testing: [WARNING] High loss value alert - 11.40. Can result in unstable gradients.
cuda train PLBartForCausalLM                  PASS
WARNING:__main__:Sequence Length not defined for PLBartForConditionalGeneration. Choosing 128 arbitrarily
[2022-11-12 17:43:13,462] torch._dynamo.testing: [WARNING] High loss value alert - 11.29. Can result in unstable gradients.
Running huggingface.py PLBartForConditionalGeneration...
[2022-11-12 17:43:13,557] torch._dynamo.testing: [WARNING] High loss value alert - 11.29. Can result in unstable gradients.
[2022-11-12 17:43:13,656] torch._dynamo.testing: [WARNING] High loss value alert - 11.29. Can result in unstable gradients.
[2022-11-12 17:43:13,741] torch._dynamo.testing: [WARNING] High loss value alert - 11.29. Can result in unstable gradients.
[2022-11-12 17:43:49,141] torch._dynamo.testing: [WARNING] High loss value alert - 11.29. Can result in unstable gradients.
cuda train PLBartForConditionalGeneration     PASS
WARNING:__main__:Sequence Length not defined for PegasusForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:44:03,142] torch._dynamo.testing: [WARNING] High loss value alert - 11.07. Can result in unstable gradients.
Running huggingface.py PegasusForCausalLM...
[2022-11-12 17:44:03,231] torch._dynamo.testing: [WARNING] High loss value alert - 11.07. Can result in unstable gradients.
[2022-11-12 17:44:03,331] torch._dynamo.testing: [WARNING] High loss value alert - 11.07. Can result in unstable gradients.
[2022-11-12 17:44:03,413] torch._dynamo.testing: [WARNING] High loss value alert - 11.07. Can result in unstable gradients.
[2022-11-12 17:44:27,926] torch._dynamo.testing: [WARNING] High loss value alert - 11.07. Can result in unstable gradients.
cuda train PegasusForCausalLM                 PASS
WARNING:__main__:Sequence Length not defined for PegasusForConditionalGeneration. Choosing 128 arbitrarily
[2022-11-12 17:44:49,548] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
Running huggingface.py PegasusForConditionalGeneration...
[2022-11-12 17:44:49,742] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:44:50,001] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:44:50,187] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:45:51,694] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
cuda train PegasusForConditionalGeneration    PASS
If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`
[2022-11-12 17:46:01,889] torch._dynamo.testing: [WARNING] High loss value alert - 10.48. Can result in unstable gradients.
Running huggingface.py RobertaForCausalLM...
[2022-11-12 17:46:01,973] torch._dynamo.testing: [WARNING] High loss value alert - 10.48. Can result in unstable gradients.
[2022-11-12 17:46:02,060] torch._dynamo.testing: [WARNING] High loss value alert - 10.48. Can result in unstable gradients.
[2022-11-12 17:46:02,134] torch._dynamo.testing: [WARNING] High loss value alert - 10.48. Can result in unstable gradients.
[2022-11-12 17:46:27,549] torch._dynamo.testing: [WARNING] High loss value alert - 10.48. Can result in unstable gradients.
cuda train RobertaForCausalLM                 PASS
[2022-11-12 17:46:36,256] torch._dynamo.testing: [WARNING] High loss value alert - 4.85. Can result in unstable gradients.
Running huggingface.py RobertaForQuestionAnswering...
[2022-11-12 17:46:36,356] torch._dynamo.testing: [WARNING] High loss value alert - 4.85. Can result in unstable gradients.
[2022-11-12 17:46:36,448] torch._dynamo.testing: [WARNING] High loss value alert - 4.85. Can result in unstable gradients.
[2022-11-12 17:46:36,524] torch._dynamo.testing: [WARNING] High loss value alert - 4.85. Can result in unstable gradients.
[2022-11-12 17:47:01,391] torch._dynamo.testing: [WARNING] High loss value alert - 4.85. Can result in unstable gradients.
cuda train RobertaForQuestionAnswering        PASS
WARNING:__main__:Sequence Length not defined for Speech2Text2ForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:47:08,371] torch._dynamo.testing: [WARNING] High loss value alert - 9.20. Can result in unstable gradients.
Running huggingface.py Speech2Text2ForCausalLM...
[2022-11-12 17:47:08,413] torch._dynamo.testing: [WARNING] High loss value alert - 9.20. Can result in unstable gradients.
[2022-11-12 17:47:08,461] torch._dynamo.testing: [WARNING] High loss value alert - 9.20. Can result in unstable gradients.
[2022-11-12 17:47:08,499] torch._dynamo.testing: [WARNING] High loss value alert - 9.20. Can result in unstable gradients.
[2022-11-12 17:47:22,386] torch._dynamo.testing: [WARNING] High loss value alert - 9.20. Can result in unstable gradients.
cuda train Speech2Text2ForCausalLM            PASS
[2022-11-12 17:47:29,958] torch._dynamo.testing: [WARNING] High loss value alert - 10.88. Can result in unstable gradients.
Running huggingface.py T5ForConditionalGeneration...
WARNING:common:fp64 golden ref were not generated for T5ForConditionalGeneration
[2022-11-12 17:47:30,153] torch._dynamo.testing: [WARNING] High loss value alert - 10.88. Can result in unstable gradients.
[2022-11-12 17:47:30,281] torch._dynamo.testing: [WARNING] High loss value alert - 10.88. Can result in unstable gradients.
Exception ignored in: <generator object preorder_traversal._preorder_traversal at 0x7fcae4846e40>
Traceback (most recent call last):
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/sympy/core/traversal.py", line 121, in _preorder_traversal
    yield node
SystemError: error return without exception set
[2022-11-12 17:48:04,536] torch._dynamo.testing: [WARNING] High loss value alert - 10.88. Can result in unstable gradients.
cuda train T5ForConditionalGeneration         PASS
[2022-11-12 17:48:12,909] torch._dynamo.testing: [WARNING] High loss value alert - 10.87. Can result in unstable gradients.
Running huggingface.py T5Small...
WARNING:common:fp64 golden ref were not generated for T5Small
[2022-11-12 17:48:13,082] torch._dynamo.testing: [WARNING] High loss value alert - 10.87. Can result in unstable gradients.
[2022-11-12 17:48:13,210] torch._dynamo.testing: [WARNING] High loss value alert - 10.87. Can result in unstable gradients.
Exception ignored in: <generator object preorder_traversal._preorder_traversal at 0x7f598c565eb0>
Traceback (most recent call last):
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/sympy/core/traversal.py", line 121, in _preorder_traversal
    yield node
SystemError: error return without exception set
[2022-11-12 17:48:47,535] torch._dynamo.testing: [WARNING] High loss value alert - 10.87. Can result in unstable gradients.
cuda train T5Small                            PASS
WARNING:__main__:Sequence Length not defined for TrOCRForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:49:00,110] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
Running huggingface.py TrOCRForCausalLM...
[2022-11-12 17:49:00,201] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:49:00,302] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:49:00,385] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
[2022-11-12 17:49:26,266] torch._dynamo.testing: [WARNING] High loss value alert - 11.11. Can result in unstable gradients.
cuda train TrOCRForCausalLM                   PASS
WARNING:__main__:Sequence Length not defined for XGLMForCausalLM. Choosing 128 arbitrarily
[2022-11-12 17:49:46,824] torch._dynamo.testing: [WARNING] High loss value alert - 12.58. Can result in unstable gradients.
Running huggingface.py XGLMForCausalLM...
[2022-11-12 17:49:47,003] torch._dynamo.testing: [WARNING] High loss value alert - 12.58. Can result in unstable gradients.
[2022-11-12 17:49:47,136] torch._dynamo.testing: [WARNING] High loss value alert - 12.58. Can result in unstable gradients.
[2022-11-12 17:49:47,259] torch._dynamo.testing: [WARNING] High loss value alert - 12.58. Can result in unstable gradients.
[2022-11-12 17:50:36,738] torch._dynamo.testing: [WARNING] High loss value alert - 12.58. Can result in unstable gradients.
cuda train XGLMForCausalLM                    PASS
[2022-11-12 17:50:50,834] torch._dynamo.testing: [WARNING] High loss value alert - 10.50. Can result in unstable gradients.
Running huggingface.py XLNetLMHeadModel...
[2022-11-12 17:50:51,183] torch._dynamo.testing: [WARNING] High loss value alert - 10.50. Can result in unstable gradients.
[2022-11-12 17:50:51,424] torch._dynamo.testing: [WARNING] High loss value alert - 10.50. Can result in unstable gradients.
[2022-11-12 17:50:51,620] torch._dynamo.testing: [WARNING] High loss value alert - 10.50. Can result in unstable gradients.
[2022-11-12 17:52:26,203] torch._dynamo.testing: [WARNING] High loss value alert - 10.50. Can result in unstable gradients.
cuda train XLNetLMHeadModel                   PASS
[2022-11-12 17:52:37,382] torch._dynamo.testing: [WARNING] High loss value alert - 10.47. Can result in unstable gradients.
Running huggingface.py YituTechConvBert...
[2022-11-12 17:52:38,499] torch._dynamo.testing: [WARNING] High loss value alert - 10.47. Can result in unstable gradients.
[2022-11-12 17:52:38,629] torch._dynamo.testing: [WARNING] High loss value alert - 10.47. Can result in unstable gradients.
[2022-11-12 17:52:38,740] torch._dynamo.testing: [WARNING] High loss value alert - 10.47. Can result in unstable gradients.
[2022-11-12 17:53:19,114] torch._dynamo.testing: [WARNING] High loss value alert - 10.47. Can result in unstable gradients.
cuda train YituTechConvBert                   PASS
Running timm_models.py adv_inception_v3...
[2022-11-12 17:53:41,502] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 17:53:49,129] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train adv_inception_v3                   PASS
Running timm_models.py beit_base_patch16_224...
cuda train beit_base_patch16_224              PASS
Running timm_models.py botnet26t_256...
cuda train botnet26t_256                      PASS
Running timm_models.py cait_m36_384...
cuda train cait_m36_384                       PASS
Running timm_models.py coat_lite_mini...
[2022-11-12 17:57:36,187] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 1.02014, (ref-fp64): 0.00051 and shape=torch.Size([2, 1000])
cuda train coat_lite_mini                     FAIL
Running timm_models.py convit_base...
WARNING:common:fp64 golden ref were not generated for convit_base
ERROR:common:

from user code:
   File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 138, in get_rel_indices
    indy = ind.repeat_interleave(img_size, dim=0).repeat_interleave(img_size, dim=1)

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1078, in run_node
    return getattr(args[0], node.target)(*args[1:], **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 883, in __torch_dispatch__
    op_impl_out = op_impl(self, func, *args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_subclasses/fake_tensor.py", line 362, in dyn_shape
    raise DynamicOutputShapeException(func)
torch._subclasses.fake_tensor.DynamicOutputShapeException: aten.repeat_interleave.Tensor

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
    return wrap_fake_exception(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
    return fn()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
    raise RuntimeError(
RuntimeError: Failed running call_method repeat_interleave(*(FakeTensor(FakeTensor(..., device='meta', size=(14, 14), dtype=torch.int64), cpu), 14), **{'dim': 0}):
aten.repeat_interleave.Tensor
(scroll up for backtrace)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 305, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 308, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 333, in forward
    x = self.forward_features(x)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 323, in forward_features
    x = blk(x)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 214, in forward
    x = x + self.drop_path(self.attn(self.norm1(x)))
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 86, in forward
    self.rel_indices = self.get_rel_indices(N)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
    return _compile(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
    super().run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 835, in CALL_FUNCTION_KW
    self.call_function(fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/misc.py", line 572, in call_function
    return self.obj.call_method(tx, self.name, args, kwargs).add_options(self)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/tensor.py", line 331, in call_method
    return wrap_fx_proxy(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
    return wrap_fx_proxy_cls(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
    example_value = get_fake_value(proxy.node, tx)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
    raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:

from user code:
   File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/convit.py", line 138, in get_rel_indices
    indy = ind.repeat_interleave(img_size, dim=0).repeat_interleave(img_size, dim=1)

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

TorchDynamo optimized model failed to run because of following error
cuda train convit_base                        FAIL
Running timm_models.py convmixer_768_32...
[2022-11-12 17:58:00,433] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train convmixer_768_32                   PASS
Running timm_models.py convnext_base...
cuda train convnext_base                      PASS
Running timm_models.py crossvit_9_240...
ERROR:common:

from user code:
   File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/crossvit.py", line 394, in forward_features
    x_ = scale_image(x_, ss, self.crop_scale)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/crossvit.py", line 281, in scale_image
    x = torch.nn.functional.interpolate(x, size=ss, mode='bicubic', align_corners=False)

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1076, in run_node
    return node.target(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/functional.py", line 3964, in interpolate
    return torch._C._nn.upsample_bicubic2d(input, output_size, align_corners, scale_factors)
RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
    return wrap_fake_exception(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
    return fn()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
    raise RuntimeError(
RuntimeError: Failed running call_function <function interpolate at 0x7f64d87e71f0>(*(FakeTensor(FakeTensor(..., device='meta', size=(s0, 3, 240, 240)), cuda:0),), **{'size': (224, 224), 'mode': 'bicubic', 'align_corners': False}):
Cannot call sizes() on tensor with symbolic sizes/strides
(scroll up for backtrace)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 305, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 308, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
    return _compile(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
    super().run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 224, in call_function
    return super().call_function(tx, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 194, in call_function
    return super(UserFunctionVariable, self).call_function(tx, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 65, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 194, in call_function
    return super(UserFunctionVariable, self).call_function(tx, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 65, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 835, in CALL_FUNCTION_KW
    self.call_function(fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/torch.py", line 403, in call_function
    tensor_variable = wrap_fx_proxy(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
    return wrap_fx_proxy_cls(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
    example_value = get_fake_value(proxy.node, tx)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
    raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:

from user code:
   File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/crossvit.py", line 394, in forward_features
    x_ = scale_image(x_, ss, self.crop_scale)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/crossvit.py", line 281, in scale_image
    x = torch.nn.functional.interpolate(x, size=ss, mode='bicubic', align_corners=False)

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

TorchDynamo optimized model failed to run because of following error
cuda train crossvit_9_240                     FAIL
Running timm_models.py cspdarknet53...
cuda train cspdarknet53                       PASS
Running timm_models.py deit_base_distilled_patch16_224...
cuda train deit_base_distilled_patch16_224    PASS
[2022-11-12 18:00:54,914] torch._dynamo.testing: [WARNING] High loss value alert - 1.02. Can result in unstable gradients.
Running timm_models.py dla102...
[2022-11-12 18:00:55,158] torch._dynamo.testing: [WARNING] High loss value alert - 1.02. Can result in unstable gradients.
[2022-11-12 18:00:55,352] torch._dynamo.testing: [WARNING] High loss value alert - 1.02. Can result in unstable gradients.
[2022-11-12 18:00:55,506] torch._dynamo.testing: [WARNING] High loss value alert - 1.02. Can result in unstable gradients.
[2022-11-12 18:01:03,311] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:01:10,384] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:01:10,576] torch._dynamo.testing: [WARNING] High loss value alert - 1.02. Can result in unstable gradients.
cuda train dla102                             PASS
Running timm_models.py dm_nfnet_f0...
cuda train dm_nfnet_f0                        PASS
Running timm_models.py dpn107...
cuda train dpn107                             PASS
Running timm_models.py eca_botnext26ts_256...
cuda train eca_botnext26ts_256                PASS
Running timm_models.py eca_halonext26ts...
cuda train eca_halonext26ts                   PASS
Running timm_models.py ese_vovnet19b_dw...
cuda train ese_vovnet19b_dw                   PASS
Running timm_models.py fbnetc_100...
cuda train fbnetc_100                         PASS
Running timm_models.py fbnetv3_b...
cuda train fbnetv3_b                          PASS
Running timm_models.py gernet_l...
cuda train gernet_l                           PASS
Running timm_models.py ghostnet_100...
[2022-11-12 18:11:04,775] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train ghostnet_100                       PASS
Running timm_models.py gluon_inception_v3...
[2022-11-12 18:11:22,833] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:11:30,339] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train gluon_inception_v3                 PASS
[2022-11-12 18:11:39,278] torch._dynamo.testing: [WARNING] High loss value alert - 1.00. Can result in unstable gradients.
Running timm_models.py gluon_xception65...
[2022-11-12 18:11:39,522] torch._dynamo.testing: [WARNING] High loss value alert - 1.00. Can result in unstable gradients.
[2022-11-12 18:11:39,706] torch._dynamo.testing: [WARNING] High loss value alert - 1.00. Can result in unstable gradients.
[2022-11-12 18:11:39,831] torch._dynamo.testing: [WARNING] High loss value alert - 1.00. Can result in unstable gradients.
[2022-11-12 18:11:47,847] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:11:48,187] torch._dynamo.testing: [WARNING] High loss value alert - 1.00. Can result in unstable gradients.
Exception ignored in: <generator object preorder_traversal._preorder_traversal at 0x7f9192dfb9e0>
Traceback (most recent call last):
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/sympy/core/traversal.py", line 121, in _preorder_traversal
    yield node
SystemError: error return without exception set
cuda train gluon_xception65                   PASS
Running timm_models.py gmixer_24_224...
cuda train gmixer_24_224                      PASS
Running timm_models.py gmlp_s16_224...
cuda train gmlp_s16_224                       PASS
Running timm_models.py hrnet_w18...
ERROR:common:

from user code:
   File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/hrnet.py", line 713, in stages
    yl = self.stage2(xl)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/hrnet.py", line 495, in forward
    y = y + fuse_outer[j](x[j])

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1081, in run_node
    return nnmodule(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/upsampling.py", line 156, in forward
    return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,
  File "/scratch/ezyang/work/pytorch/torch/nn/functional.py", line 3928, in interpolate
    return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1042, in get_fake_value
    return wrap_fake_exception(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 721, in wrap_fake_exception
    return fn()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1043, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1085, in run_node
    raise RuntimeError(
RuntimeError: Failed running call_module sub1_1_2(*(FakeTensor(FakeTensor(..., device='meta', size=(s0, 18, (s2 - 1)//8 + 1, (s2 - 1)//8 + 1),
           grad_fn=<CudnnBatchNormBackward0>), cuda:0),), **{}):
Cannot call sizes() on tensor with symbolic sizes/strides
(scroll up for backtrace)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 305, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 308, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
    return _compile(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
    super().run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 224, in call_function
    return super().call_function(tx, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 194, in call_function
    return super(UserFunctionVariable, self).call_function(tx, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 65, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 224, in call_function
    return super().call_function(tx, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 194, in call_function
    return super(UserFunctionVariable, self).call_function(tx, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/functions.py", line 65, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 183, in call_function
    tx.call_function(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 222, in call_function
    return tx.inline_user_function_return(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 300, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1565, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1619, in inline_call_
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 181, in wrapper
    return inner_fn(self, inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 786, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 183, in call_function
    tx.call_function(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 271, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/nn_module.py", line 202, in call_function
    return wrap_fx_proxy(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 631, in wrap_fx_proxy
    return wrap_fx_proxy_cls(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/variables/builder.py", line 671, in wrap_fx_proxy_cls
    example_value = get_fake_value(proxy.node, tx)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 1055, in get_fake_value
    raise TorchRuntimeError() from e
torch._dynamo.exc.TorchRuntimeError:

from user code:
   File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/hrnet.py", line 713, in stages
    yl = self.stage2(xl)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/hrnet.py", line 495, in forward
    y = y + fuse_outer[j](x[j])

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

TorchDynamo optimized model failed to run because of following error
cuda train hrnet_w18                          FAIL
Running timm_models.py inception_v3...
[2022-11-12 18:13:46,782] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:13:54,402] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train inception_v3                       PASS
Running timm_models.py jx_nest_base...
[2022-11-12 18:15:04,132] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.00751, (ref-fp64): 0.00020 and shape=torch.Size([128, 3, 4, 4])
[2022-11-12 18:15:04,132] torch._dynamo.utils: [ERROR] Accuracy failed for key name patch_embed.proj.weight.grad
cuda train jx_nest_base                       FAIL
Running timm_models.py lcnet_050...
cuda train lcnet_050                          PASS
Running timm_models.py levit_128...
WARNING:common:fp64 golden ref were not generated for levit_128
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 322, in <module>
    main(TimmRunnner())
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1589, in main
    return maybe_fresh_cache(run, args.cold_start_latency and args.only)(
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 780, in inner
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1922, in run
    runner.run_one_model(
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1305, in run_one_model
    status = self.check_accuracy(
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1102, in check_accuracy
    correct_result = self.run_n_iterations(
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1026, in run_n_iterations
    return self.model_iter_fn(mod, inputs, collect_outputs=True)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 312, in forward_and_backward_pass
    self.grad_scaler.scale(loss).backward()
  File "/scratch/ezyang/work/pytorch/torch/_tensor.py", line 473, in backward
    torch.autograd.backward(
  File "/scratch/ezyang/work/pytorch/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
cuda train levit_128 FAIL
Running timm_models.py mixer_b16_224...
cuda train mixer_b16_224                      PASS
Running timm_models.py mixnet_l...
cuda train mixnet_l                           PASS
Running timm_models.py mnasnet_100...
cuda train mnasnet_100                        PASS
Running timm_models.py mobilenetv2_100...
cuda train mobilenetv2_100                    PASS
Running timm_models.py mobilenetv3_large_100...
cuda train mobilenetv3_large_100              PASS
[2022-11-12 18:19:28,470] torch._dynamo.testing: [WARNING] High loss value alert - 52.30. Can result in unstable gradients.
Running timm_models.py mobilevit_s...
[2022-11-12 18:19:28,588] torch._dynamo.testing: [WARNING] High loss value alert - 52.36. Can result in unstable gradients.
[2022-11-12 18:19:28,700] torch._dynamo.testing: [WARNING] High loss value alert - 52.30. Can result in unstable gradients.
[2022-11-12 18:19:28,784] torch._dynamo.testing: [WARNING] High loss value alert - 52.30. Can result in unstable gradients.
[2022-11-12 18:21:16,207] torch._dynamo.testing: [WARNING] High loss value alert - 52.30. Can result in unstable gradients.
cuda train mobilevit_s                        PASS
Running timm_models.py nfnet_l0...
cuda train nfnet_l0                           PASS
Running timm_models.py pit_b_224...
cuda train pit_b_224                          PASS
Running timm_models.py pnasnet5large...
[2022-11-12 18:23:49,227] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:24:39,504] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
Exception ignored in: <generator object preorder_traversal._preorder_traversal at 0x7f5e9c2fce40>
Traceback (most recent call last):
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/sympy/core/traversal.py", line 121, in _preorder_traversal
    yield node
SystemError: error return without exception set
cuda train pnasnet5large                      PASS
Running timm_models.py poolformer_m36...
cuda train poolformer_m36                     PASS
[2022-11-12 18:26:22,462] torch._dynamo.testing: [WARNING] High loss value alert - 1.30. Can result in unstable gradients.
Running timm_models.py regnety_002...
[2022-11-12 18:26:22,558] torch._dynamo.testing: [WARNING] High loss value alert - 1.30. Can result in unstable gradients.
[2022-11-12 18:26:22,675] torch._dynamo.testing: [WARNING] High loss value alert - 1.30. Can result in unstable gradients.
[2022-11-12 18:26:22,761] torch._dynamo.testing: [WARNING] High loss value alert - 1.30. Can result in unstable gradients.
[2022-11-12 18:26:48,252] torch._dynamo.testing: [WARNING] High loss value alert - 1.30. Can result in unstable gradients.
cuda train regnety_002                        PASS
Running timm_models.py repvgg_a2...
cuda train repvgg_a2                          PASS
Running timm_models.py res2net101_26w_4s...
[2022-11-12 18:27:43,851] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train res2net101_26w_4s                  PASS
Running timm_models.py res2net50_14w_8s...
[2022-11-12 18:28:05,527] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train res2net50_14w_8s                   PASS
Running timm_models.py res2next50...
[2022-11-12 18:28:21,665] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train res2next50                         PASS
Running timm_models.py resmlp_12_224...
cuda train resmlp_12_224                      PASS
Running timm_models.py resnest101e...
[2022-11-12 18:29:02,106] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train resnest101e                        PASS
Running timm_models.py rexnet_100...
[2022-11-12 18:29:19,988] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train rexnet_100                         PASS
Running timm_models.py sebotnet33ts_256...
cuda train sebotnet33ts_256                   PASS
Running timm_models.py selecsls42b...
[2022-11-12 18:30:35,721] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train selecsls42b                        PASS
Running timm_models.py spnasnet_100...
cuda train spnasnet_100                       PASS
Running timm_models.py swin_base_patch4_window7_224...
ERROR:common:compile_fn raised AssertionError: While executing %truediv : [#users=1] = call_method[target=__truediv__](args = (%getitem_14, 64.0), kwargs = {})

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True
Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 449, in call_user_compiler
    compiled_fn = self.compiler_fn(gm, self.example_inputs())
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/debug_utils.py", line 803, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 94, in compile_fn
    return cls(gm, example_inputs).verified_candidate()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 116, in __init__
    if not is_aot_autograd_safe_to_run(gm, example_inputs):
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/training.py", line 64, in is_aot_autograd_safe_to_run
    mutated = has_mutation(gm, example_inputs, inputs_only=True)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 160, in has_mutation
    ShapeAliasingAndMutationProp(new_gm).run(*example_inputs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 124, in run
    super().run(*args)
  File "/scratch/ezyang/work/pytorch/torch/fx/interpreter.py", line 130, in run
    self.env[node] = self.run_node(node)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/optimizations/analysis.py", line 59, in run_node
    assert result is not NotImplemented
AssertionError: While executing %truediv : [#users=1] = call_method[target=__truediv__](args = (%getitem_14, 64.0), kwargs = {})

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1127, in check_accuracy
    new_result = optimized_model_iter_fn(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 169, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/common.py", line 1025, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 305, in forward_and_backward_pass
    cloned_inputs = clone_inputs(inputs)
  File "/scratch/ezyang/work/pytorch/benchmarks/dynamo/timm_models.py", line 308, in <graph break in forward_and_backward_pass>
    pred = mod(*cloned_inputs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/swin_transformer.py", line 568, in forward
    x = self.forward_features(x)
  File "/scratch/ezyang/work/env/lib/python3.9/site-packages/timm/models/swin_transformer.py", line 558, in forward_features
    x = self.layers(x)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/scratch/ezyang/work/pytorch/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/eval_frame.py", line 247, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 474, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 118, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/utils.py", line 89, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 347, in _convert_frame_assert
    return _compile(
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 402, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/convert_frame.py", line 390, in transform
    tracer.run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 1493, in run
    super().run()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in run
    and self.step()
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 329, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/symbolic_convert.py", line 203, in wrapper
    self.output.compile_subgraph(self, reason=reason)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 371, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 417, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/scratch/ezyang/work/pytorch/torch/_dynamo/output_graph.py", line 454, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised AssertionError: While executing %truediv : [#users=1] = call_method[target=__truediv__](args = (%getitem_14, 64.0), kwargs = {})

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

TorchDynamo optimized model failed to run because of following error
cuda train swin_base_patch4_window7_224       FAIL
Running timm_models.py swsl_resnext101_32x16d...
[2022-11-12 18:31:53,423] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train swsl_resnext101_32x16d             PASS
Running timm_models.py tf_efficientnet_b0...
cuda train tf_efficientnet_b0                 PASS
Running timm_models.py tf_mixnet_l...
cuda train tf_mixnet_l                        PASS
Running timm_models.py tinynet_a...
cuda train tinynet_a                          PASS
Running timm_models.py tnt_s_patch16_224...
cuda train tnt_s_patch16_224                  PASS
Running timm_models.py twins_pcpvt_base...
[2022-11-12 18:45:52,863] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.01689, (ref-fp64): 0.00004 and shape=torch.Size([64, 3, 4, 4])
[2022-11-12 18:45:52,863] torch._dynamo.utils: [ERROR] Accuracy failed for key name patch_embeds.0.proj.weight.grad
cuda train twins_pcpvt_base                   FAIL
Running timm_models.py visformer_small...
[2022-11-12 18:46:11,825] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:46:17,629] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train visformer_small                    PASS
Running timm_models.py vit_base_patch16_224...
cuda train vit_base_patch16_224               PASS
Running timm_models.py volo_d1_224...
[2022-11-12 18:47:18,248] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:47:35,182] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train volo_d1_224                        PASS
Running timm_models.py xcit_large_24_p8_224...
WARNING:common:fp64 golden ref were not generated for xcit_large_24_p8_224
[2022-11-12 18:48:06,375] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-12 18:48:21,465] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
cuda train xcit_large_24_p8_224               PASS