Skip to content

Instantly share code, notes, and snippets.

View jamesr66a's full-sized avatar

James Reed jamesr66a

View GitHub Profile
import torch
import torch.fx
class TestModuleWithShapeControlFlow(torch.nn.Module):
def forward(self, x):
# with normal symtracing, the `x.dim()` accesses would fail
if x.dim() == 3:
y = x[0, :, :]
elif x.dim() == 4:
y = x[0, :, :, :]
# a.py
import torch
class Foo(torch.nn.Module):
def forward(self, x):
return x + len(x)
# b.py
import torch
commit b0703a2d968ccc91760ad738e9a50b3a913969a9
Author: James Reed <jamesreed@fb.com>
Date: Wed Mar 2 01:03:05 2022 +0000
Serialization fixes for HF tracer
diff --git a/src/transformers/utils/fx.py b/src/transformers/utils/fx.py
index b88ae4ae7..aeb345ad3 100644
--- a/src/transformers/utils/fx.py
+++ b/src/transformers/utils/fx.py
submod_2 UserRRef(RRefId = GloballyUniqueId(created_on=0, local_id=30), ForkId = GloballyUniqueId(created_on=0, local_id=31))
(46501) ^^^^ Scenario 1 (created_on=0, local_id=30)
(46501) Instantiating OwnerRRef GloballyUniqueId(created_on=0, local_id=30) with future 0x7f6020007e80
(46501) Deserializing PyRRef parent: 0 rref ID: GloballyUniqueId(created_on=0, local_id=30) fork ID: GloballyUniqueId(created_on=0, local_id=33)
(46501) ^^^^ Scenario 2 (created_on=0, local_id=30)
(46501) Deserializing PyRRef parent: 0 rref ID: GloballyUniqueId(created_on=0, local_id=30) fork ID: GloballyUniqueId(created_on=0, local_id=38)
getitem_12 (46501) ../torch/csrc/distributed/rpc/rref_context.cpp:769 GloballyUniqueId(created_on=0, local_id=30) GloballyUniqueId(created_on=2, local_id=2)
(46501) ../torch/csrc/distributed/rpc/rref_context.cpp:474 GloballyUniqueId(created_on=0, local_id=30)
(46501) ../torch/csrc/distributed/rpc/rref_context.cpp:769 GloballyUniqueId(created_on=0, local_id=30) GloballyUniqueId(created_on=2, local_i
#0 0x00007f10250bbd1d in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007f101047603d in torch::distributed::rpc::RRefContext::delForkOfOwner(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&) () from /fsx/users/jamesreed/pytorch/torch/lib/libtorch_cpu.so
#2 0x00007f1010477dc2 in torch::distributed::rpc::RRefContext::notifyOwnerAndParentOfFork(torch::distributed::rpc::GloballyUniqueId const&, short, c10::intrusive_ptr<torch::distributed::rpc::RRef, c10::detail::intrusive_target_default_null_type<torch::distributed::rpc::RRef> > const&) () from /fsx/users/jamesreed/pytorch/torch/lib/libtorch_cpu.so
#3 0x00007f10166da01a in torch::distributed::rpc::PyRRef::unpickle(pybind11::tuple const&) ()
from /fsx/users/jamesreed/pytorch/torch/lib/libtorch_python.so
#4 0x00007f10166d1ef7 in void pybind11::cpp_function::initialize<torch::distributed::rpc::PyRRef (*&)(pybind11::tuple const&), torch::distributed::rpc::PyRRef, pybind11::tuple const&,
(16399) ^^^^ Scenario 1 (created_on=0, local_id=30)
(16399) Instantiating OwnerRRef GloballyUniqueId(created_on=0, local_id=30) with future 0x7f19c4007910
(16399) ../torch/csrc/distributed/rpc/rref_context.cpp:769 GloballyUniqueId(created_on=0, local_id=30) GloballyUniqueId(created_on=2, local_id=2)
(16399) ../torch/csrc/distributed/rpc/rref_context.cpp:474 GloballyUniqueId(created_on=0, local_id=30)
(16399) ^^^^ Scenario 2 (created_on=0, local_id=30)
^^^^^ (16399) ../torch/csrc/distributed/rpc/rref_context.cpp:769 GloballyUniqueId(created_on=0, local_id=30) GloballyUniqueId(created_on=2, local_id=8)
(16399) ../torch/csrc/distributed/rpc/rref_context.cpp:474 GloballyUniqueId(created_on=0, local_id=30)
(16399) ^^^^ Scenario 2 (created_on=0, local_id=30)
(16399) ^^^^ Scenario 2 (created_on=0, local_id=30)
(16399) ../torch/csrc/distributed/rpc/rref_context.cpp:814 GloballyUniqueId(created_on=0, local_id=30) GloballyUniqueId(created_on=2, local_id=2)
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
REPLICATE config: False -> MultiUseParameterConfig.TRANSMIT
GraphModule(
(submod_0): GraphModule()
(submod_1): GraphModule()
(submod_2): GraphModule()
(_loss): MSELoss()
diff --git a/torch/csrc/distributed/rpc/rref_context.cpp b/torch/csrc/distributed/rpc/rref_context.cpp
index 004e9422be..28b2bbc5c2 100644
--- a/torch/csrc/distributed/rpc/rref_context.cpp
+++ b/torch/csrc/distributed/rpc/rref_context.cpp
@@ -4,6 +4,9 @@
#include <sstream>
+#include <iostream>
+#include <unistd.h>
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[W socket.cpp:701] The server socket on [ip-10-200-31-5.ec2.internal]:46879 is not yet listening (errno: 111 - Connection refused), will retry.
[W socket.cpp:701] The server socket on [ip-10-200-31-5.ec2.internal]:46879 is not yet listening (errno: 111 - Connection refused), will retry.
[W socket.cpp:701] The server socket on [ip-10-200-31-5.ec2.internal]:46879 is not yet listening (errno: 111 - Connection refused), will retry.
[W socket.cpp:701] The server socket on [ip-10-200-31-5.ec2.internal]:46879 is not yet listening (errno: 111 - Connection refused), will retry.
REPLICATE config: False -> MultiUseParameterConfig.TRANSMIT
GraphModule(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[W socket.cpp:701] The server socket on [ip-10-200-31-5.ec2.internal]:39643 is not yet listening (errno: 111 - Connection refused), will retry.
[W socket.cpp:701] The server socket on [ip-10-200-31-5.ec2.internal]:39643 is not yet listening (errno: 111 - Connection refused), will retry.
[W socket.cpp:701] The server socket on [ip-10-200-31-5.ec2.internal]:39643 is not yet listening (errno: 111 - Connection refused), will retry.
[W socket.cpp:701] The server socket on [ip-10-200-31-5.ec2.internal]:39643 is not yet listening (errno: 111 - Connection refused), will retry.
[W socket.cpp:701] The server socket on [ip-10-200-31-5.ec2.internal]:39643 is not yet listening (errno: 111 - Connection r