Skip to content

Instantly share code, notes, and snippets.

@awwong1
Created April 5, 2020 17:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save awwong1/4e7bfbddbadaa85884fedab049118443 to your computer and use it in GitHub Desktop.
Save awwong1/4e7bfbddbadaa85884fedab049118443 to your computer and use it in GitHub Desktop.
Torchprof v1.0.0 limitation
!pip install -U torch torchvision
!pip install cython; pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
!pip install git+https://github.com/facebookresearch/fvcore.git
!git clone https://github.com/facebookresearch/detectron2 detectron2_repo
!pip install -e detectron2_repo
!pip install opencv-python
!pip install torchprof
Requirement already up-to-date: torch in ./venv/lib/python3.6/site-packages (1.4.0)
Requirement already up-to-date: torchvision in ./venv/lib/python3.6/site-packages (0.5.0)
Requirement already satisfied, skipping upgrade: six in ./venv/lib/python3.6/site-packages (from torchvision) (1.14.0)
Requirement already satisfied, skipping upgrade: numpy in ./venv/lib/python3.6/site-packages (from torchvision) (1.18.2)
Requirement already satisfied, skipping upgrade: pillow>=4.1.1 in ./venv/lib/python3.6/site-packages (from torchvision) (7.1.1)
Requirement already satisfied: cython in ./venv/lib/python3.6/site-packages (0.29.16)
Collecting git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
  Cloning https://github.com/cocodataset/cocoapi.git to /tmp/pip-req-build-1gei8zd8
  Running command git clone -q https://github.com/cocodataset/cocoapi.git /tmp/pip-req-build-1gei8zd8
Requirement already satisfied, skipping upgrade: setuptools>=18.0 in ./venv/lib/python3.6/site-packages (from pycocotools==2.0) (46.1.3)
Requirement already satisfied, skipping upgrade: cython>=0.27.3 in ./venv/lib/python3.6/site-packages (from pycocotools==2.0) (0.29.16)
Requirement already satisfied, skipping upgrade: matplotlib>=2.1.0 in ./venv/lib/python3.6/site-packages (from pycocotools==2.0) (3.2.1)
Requirement already satisfied, skipping upgrade: cycler>=0.10 in ./venv/lib/python3.6/site-packages (from matplotlib>=2.1.0->pycocotools==2.0) (0.10.0)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.1 in ./venv/lib/python3.6/site-packages (from matplotlib>=2.1.0->pycocotools==2.0) (2.8.1)
Requirement already satisfied, skipping upgrade: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in ./venv/lib/python3.6/site-packages (from matplotlib>=2.1.0->pycocotools==2.0) (2.4.6)
Requirement already satisfied, skipping upgrade: numpy>=1.11 in ./venv/lib/python3.6/site-packages (from matplotlib>=2.1.0->pycocotools==2.0) (1.18.2)
Requirement already satisfied, skipping upgrade: kiwisolver>=1.0.1 in ./venv/lib/python3.6/site-packages (from matplotlib>=2.1.0->pycocotools==2.0) (1.2.0)
Requirement already satisfied, skipping upgrade: six in ./venv/lib/python3.6/site-packages (from cycler>=0.10->matplotlib>=2.1.0->pycocotools==2.0) (1.14.0)
Building wheels for collected packages: pycocotools
  Building wheel for pycocotools (setup.py) ... �[?25ldone
�[?25h  Created wheel for pycocotools: filename=pycocotools-2.0-cp36-cp36m-linux_x86_64.whl size=275365 sha256=553f5859bf8f249f440289a978ef900af362993192f13a27d6936e0dfbbb39d6
  Stored in directory: /tmp/pip-ephem-wheel-cache-exjzl9jo/wheels/25/c1/63/8bee2969883497d2785c9bdbe4e89cae5efc59521553d528bf
Successfully built pycocotools
Installing collected packages: pycocotools
  Attempting uninstall: pycocotools
    Found existing installation: pycocotools 2.0
    Uninstalling pycocotools-2.0:
      Successfully uninstalled pycocotools-2.0
Successfully installed pycocotools-2.0
Collecting git+https://github.com/facebookresearch/fvcore.git
  Cloning https://github.com/facebookresearch/fvcore.git to /tmp/pip-req-build-ug3z1m8c
  Running command git clone -q https://github.com/facebookresearch/fvcore.git /tmp/pip-req-build-ug3z1m8c
Requirement already satisfied (use --upgrade to upgrade): fvcore==0.1 from git+https://github.com/facebookresearch/fvcore.git in ./venv/lib/python3.6/site-packages
Requirement already satisfied: numpy in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (1.18.2)
Requirement already satisfied: yacs>=0.1.6 in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (0.1.6)
Requirement already satisfied: pyyaml>=5.1 in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (5.3.1)
Requirement already satisfied: tqdm in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (4.45.0)
Requirement already satisfied: portalocker in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (1.6.0)
Requirement already satisfied: termcolor>=1.1 in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (1.1.0)
Requirement already satisfied: Pillow in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (7.1.1)
Requirement already satisfied: tabulate in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (0.8.7)
Building wheels for collected packages: fvcore
  Building wheel for fvcore (setup.py) ... �[?25ldone
�[?25h  Created wheel for fvcore: filename=fvcore-0.1-py3-none-any.whl size=42662 sha256=70e39b821f6026b8a78b0dae41046da2418d78373ca0dd522530ec5177ec088b
  Stored in directory: /tmp/pip-ephem-wheel-cache-8nwff8wt/wheels/00/33/f4/a95dac09ddd48a293cc942b75ca598e4b7facf86176ec92f8d
Successfully built fvcore
fatal: destination path 'detectron2_repo' already exists and is not an empty directory.
Obtaining file:///home/alexander/sandbox/src/git.udia.ca/alex/detectron2-test/detectron2_repo
Requirement already satisfied: termcolor>=1.1 in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (1.1.0)
Requirement already satisfied: Pillow in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (7.1.1)
Requirement already satisfied: yacs>=0.1.6 in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (0.1.6)
Requirement already satisfied: tabulate in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (0.8.7)
Requirement already satisfied: cloudpickle in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (1.3.0)
Requirement already satisfied: matplotlib in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (3.2.1)
Requirement already satisfied: tqdm>4.29.0 in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (4.45.0)
Requirement already satisfied: tensorboard in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (2.2.0)
Requirement already satisfied: fvcore in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (0.1)
Requirement already satisfied: future in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (0.18.2)
Requirement already satisfied: pydot in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (1.4.1)
Requirement already satisfied: PyYAML in ./venv/lib/python3.6/site-packages (from yacs>=0.1.6->detectron2==0.1.1) (5.3.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in ./venv/lib/python3.6/site-packages (from matplotlib->detectron2==0.1.1) (2.4.6)
Requirement already satisfied: cycler>=0.10 in ./venv/lib/python3.6/site-packages (from matplotlib->detectron2==0.1.1) (0.10.0)
Requirement already satisfied: numpy>=1.11 in ./venv/lib/python3.6/site-packages (from matplotlib->detectron2==0.1.1) (1.18.2)
Requirement already satisfied: kiwisolver>=1.0.1 in ./venv/lib/python3.6/site-packages (from matplotlib->detectron2==0.1.1) (1.2.0)
Requirement already satisfied: python-dateutil>=2.1 in ./venv/lib/python3.6/site-packages (from matplotlib->detectron2==0.1.1) (2.8.1)
Requirement already satisfied: absl-py>=0.4 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (0.9.0)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (0.34.2)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (0.4.1)
Requirement already satisfied: google-auth<2,>=1.6.3 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (1.13.1)
Requirement already satisfied: grpcio>=1.24.3 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (1.28.1)
Requirement already satisfied: setuptools>=41.0.0 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (46.1.3)
Requirement already satisfied: requests<3,>=2.21.0 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (2.23.0)
Requirement already satisfied: six>=1.10.0 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (1.14.0)
Requirement already satisfied: werkzeug>=0.11.15 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (1.0.1)
Requirement already satisfied: markdown>=2.6.8 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (3.2.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (1.6.0.post2)
Requirement already satisfied: protobuf>=3.6.0 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (3.11.3)
Requirement already satisfied: portalocker in ./venv/lib/python3.6/site-packages (from fvcore->detectron2==0.1.1) (1.6.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in ./venv/lib/python3.6/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard->detectron2==0.1.1) (1.3.0)
Requirement already satisfied: rsa<4.1,>=3.1.4 in ./venv/lib/python3.6/site-packages (from google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (4.0)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in ./venv/lib/python3.6/site-packages (from google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (4.0.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in ./venv/lib/python3.6/site-packages (from google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (0.2.8)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in ./venv/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (1.25.8)
Requirement already satisfied: idna<3,>=2.5 in ./venv/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (2.9)
Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (2020.4.5)
Requirement already satisfied: chardet<4,>=3.0.2 in ./venv/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (3.0.4)
Requirement already satisfied: oauthlib>=3.0.0 in ./venv/lib/python3.6/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard->detectron2==0.1.1) (3.1.0)
Requirement already satisfied: pyasn1>=0.1.3 in ./venv/lib/python3.6/site-packages (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (0.4.8)
Installing collected packages: detectron2
  Attempting uninstall: detectron2
    Found existing installation: detectron2 0.1.1
    Uninstalling detectron2-0.1.1:
      Successfully uninstalled detectron2-0.1.1
  Running setup.py develop for detectron2
Successfully installed detectron2
Requirement already satisfied: opencv-python in ./venv/lib/python3.6/site-packages (4.2.0.34)
Requirement already satisfied: numpy>=1.11.3 in ./venv/lib/python3.6/site-packages (from opencv-python) (1.18.2)
Collecting torchprof
  Downloading torchprof-1.0.0-py3-none-any.whl (8.3 kB)
Requirement already satisfied: torch<2,>=1.1.0 in ./venv/lib/python3.6/site-packages (from torchprof) (1.4.0)
Installing collected packages: torchprof
Successfully installed torchprof-1.0.0
# get image
!wget http://images.cocodataset.org/val2017/000000439715.jpg -O input.jpg
im = cv2.imread("./input.jpg")
--2020-04-05 10:22:46--  http://images.cocodataset.org/val2017/000000439715.jpg
Resolving images.cocodataset.org (images.cocodataset.org)... 52.216.138.203
Connecting to images.cocodataset.org (images.cocodataset.org)|52.216.138.203|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 209222 (204K) [image/jpeg]
Saving to: ‘input.jpg’

input.jpg           100%[===================>] 204.32K  1013KB/s    in 0.2s    

2020-04-05 10:22:46 (1013 KB/s) - ‘input.jpg’ saved [209222/209222]
# import some common detectron2 utilities
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
import cv2

import torchprof

# Create config
cfg = get_cfg()
cfg.merge_from_file("./detectron2_repo/configs/COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
cfg.MODEL.WEIGHTS = "detectron2://COCO-Detection/faster_rcnn_R_101_FPN_3x/137851257/model_final_f6e8b1.pkl"

# Create predictor
predictor = DefaultPredictor(cfg)
paths = [("GeneralizedRCNN", "proposal_generator", "rpn_head", "conv"),]

with torchprof.Profile(predictor.model, paths=paths, use_cuda=True) as prof:
    predictor(im)
print(prof.display(show_events=False))

print("=" * 40)

trace, event_lists_dict = prof.raw()
# trace[262] # Trace(path=('GeneralizedRCNN', 'proposal_generator', 'rpn_head', 'conv'), leaf=True, module=Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))

for evl_run in event_lists_dict[trace[262].path]:
    print(evl_run)
Module                  | Self CPU total | CPU total | CUDA total
------------------------|----------------|-----------|-----------
GeneralizedRCNN         |                |           |           
├── backbone            |                |           |           
│├── fpn_lateral2       |                |           |           
│├── fpn_output2        |                |           |           
│├── fpn_lateral3       |                |           |           
│├── fpn_output3        |                |           |           
│├── fpn_lateral4       |                |           |           
│├── fpn_output4        |                |           |           
│├── fpn_lateral5       |                |           |           
│├── fpn_output5        |                |           |           
│├── top_block          |                |           |           
│├── bottom_up          |                |           |           
││├── stem              |                |           |           
│││├── conv1            |                |           |           
││││└── norm            |                |           |           
││├── res2              |                |           |           
│││├── 0                |                |           |           
││││├── shortcut        |                |           |           
│││││└── norm           |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 1                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 2                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
││├── res3              |                |           |           
│││├── 0                |                |           |           
││││├── shortcut        |                |           |           
│││││└── norm           |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 1                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 2                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 3                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
││├── res4              |                |           |           
│││├── 0                |                |           |           
││││├── shortcut        |                |           |           
│││││└── norm           |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 1                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 2                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 3                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 4                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 5                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 6                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 7                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 8                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 9                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 10               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 11               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 12               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 13               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 14               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 15               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 16               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 17               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 18               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 19               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 20               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 21               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 22               |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
││├── res5              |                |           |           
│││├── 0                |                |           |           
││││├── shortcut        |                |           |           
│││││└── norm           |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 1                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││├── conv3           |                |           |           
│││││└── norm           |                |           |           
│││├── 2                |                |           |           
││││├── conv1           |                |           |           
│││││└── norm           |                |           |           
││││├── conv2           |                |           |           
│││││└── norm           |                |           |           
││││└── conv3           |                |           |           
││││ └── norm           |                |           |           
├── proposal_generator  |                |           |           
│├── anchor_generator   |                |           |           
││└── cell_anchors      |                |           |           
│├── rpn_head           |                |           |           
││├── conv              |      393.380us |   1.385ms |   23.805ms
││├── objectness_logits |                |           |           
││└── anchor_deltas     |                |           |           
└── roi_heads           |                |           |           
 ├── box_pooler         |                |           |           
 │├── level_poolers     |                |           |           
 ││├── 0                |                |           |           
 ││├── 1                |                |           |           
 ││├── 2                |                |           |           
 ││└── 3                |                |           |           
 ├── box_head           |                |           |           
 │├── fc1               |                |           |           
 │└── fc2               |                |           |           
 └── box_predictor      |                |           |           
  ├── cls_score         |                |           |           
  └── bbox_pred         |                |           |           

========================================
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Name                   Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes                         
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
conv2d                 7.30%            5.930us          100.00%          81.220us         81.220us         25.06%           4.042ms          4.042ms          1                []                                   
convolution            6.39%            5.190us          92.70%           75.290us         75.290us         25.02%           4.036ms          4.036ms          1                []                                   
_convolution           14.27%           11.590us         86.31%           70.100us         70.100us         25.00%           4.032ms          4.032ms          1                []                                   
contiguous             3.92%            3.180us          3.92%            3.180us          3.180us          0.02%            3.008us          3.008us          1                []                                   
cudnn_convolution      68.12%           55.330us         68.12%           55.330us         55.330us         24.91%           4.018ms          4.018ms          1                []                                   
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Self CPU time total: 81.220us
CUDA time total: 16.131ms

---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Name                   Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes                         
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
conv2d                 6.76%            5.340us          100.00%          79.050us         79.050us         25.17%           1.168ms          1.168ms          1                []                                   
convolution            6.91%            5.460us          93.24%           73.710us         73.710us         25.10%           1.165ms          1.165ms          1                []                                   
_convolution           14.00%           11.070us         86.34%           68.250us         68.250us         24.99%           1.160ms          1.160ms          1                []                                   
contiguous             3.61%            2.850us          3.61%            2.850us          2.850us          0.05%            2.112us          2.112us          1                []                                   
cudnn_convolution      68.73%           54.330us         68.73%           54.330us         54.330us         24.70%           1.147ms          1.147ms          1                []                                   
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Self CPU time total: 79.050us
CUDA time total: 4.643ms

---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Name                   Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes                         
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
conv2d                 6.81%            5.300us          100.00%          77.880us         77.880us         25.53%           420.864us        420.864us        1                []                                   
convolution            6.86%            5.340us          93.19%           72.580us         72.580us         25.22%           415.744us        415.744us        1                []                                   
_convolution           14.23%           11.080us         86.34%           67.240us         67.240us         24.97%           411.648us        411.648us        1                []                                   
contiguous             3.92%            3.050us          3.92%            3.050us          3.050us          0.12%            2.048us          2.048us          1                []                                   
cudnn_convolution      68.19%           53.110us         68.19%           53.110us         53.110us         24.16%           398.336us        398.336us        1                []                                   
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Self CPU time total: 77.880us
CUDA time total: 1.649ms

---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Name                   Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes                         
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
conv2d                 6.63%            5.120us          100.00%          77.190us         77.190us         26.00%           212.992us        212.992us        1                []                                   
convolution            6.48%            5.000us          93.37%           72.070us         72.070us         25.38%           207.872us        207.872us        1                []                                   
_convolution           13.91%           10.740us         86.89%           67.070us         67.070us         24.99%           204.672us        204.672us        1                []                                   
contiguous             3.38%            2.610us          3.38%            2.610us          2.610us          0.25%            2.048us          2.048us          1                []                                   
cudnn_convolution      69.59%           53.720us         69.59%           53.720us         53.720us         23.38%           191.488us        191.488us        1                []                                   
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Self CPU time total: 77.190us
CUDA time total: 819.072us

---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Name                   Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes                         
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
conv2d                 6.70%            5.230us          100.00%          78.040us         78.040us         26.49%           149.504us        149.504us        1                []                                   
convolution            6.66%            5.200us          93.30%           72.810us         72.810us         25.58%           144.384us        144.384us        1                []                                   
_convolution           14.49%           11.310us         86.64%           67.610us         67.610us         24.86%           140.288us        140.288us        1                []                                   
contiguous             3.77%            2.940us          3.77%            2.940us          2.940us          0.39%            2.176us          2.176us          1                []                                   
cudnn_convolution      68.38%           53.360us         68.38%           53.360us         53.360us         22.68%           128.000us        128.000us        1                []                                   
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Self CPU time total: 78.040us
CUDA time total: 564.352us
# paths = [("GeneralizedRCNN", "proposal_generator", "rpn_head", "conv"),]

# with torchprof.Profile(predictor.model, paths=paths, use_cuda=True) as prof:
#     predictor(im)
print(prof.display(show_events=True))

# print("=" * 40)

trace, event_lists_dict = prof.raw()
# trace[262] # Trace(path=('GeneralizedRCNN', 'proposal_generator', 'rpn_head', 'conv'), leaf=True, module=Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))

for evl_run in event_lists_dict[trace[262].path]:
    print(evl_run)
Module                   | Self CPU total | CPU total | CUDA total
-------------------------|----------------|-----------|-----------
GeneralizedRCNN          |                |           |           
├── backbone             |                |           |           
│├── fpn_lateral2        |                |           |           
│├── fpn_output2         |                |           |           
│├── fpn_lateral3        |                |           |           
│├── fpn_output3         |                |           |           
│├── fpn_lateral4        |                |           |           
│├── fpn_output4         |                |           |           
│├── fpn_lateral5        |                |           |           
│├── fpn_output5         |                |           |           
│├── top_block           |                |           |           
│├── bottom_up           |                |           |           
││├── stem               |                |           |           
│││├── conv1             |                |           |           
││││└── norm             |                |           |           
││├── res2               |                |           |           
│││├── 0                 |                |           |           
││││├── shortcut         |                |           |           
│││││└── norm            |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 1                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 2                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
││├── res3               |                |           |           
│││├── 0                 |                |           |           
││││├── shortcut         |                |           |           
│││││└── norm            |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 1                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 2                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 3                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
││├── res4               |                |           |           
│││├── 0                 |                |           |           
││││├── shortcut         |                |           |           
│││││└── norm            |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 1                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 2                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 3                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 4                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 5                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 6                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 7                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 8                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 9                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 10                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 11                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 12                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 13                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 14                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 15                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 16                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 17                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 18                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 19                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 20                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 21                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 22                |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
││├── res5               |                |           |           
│││├── 0                 |                |           |           
││││├── shortcut         |                |           |           
│││││└── norm            |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 1                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││├── conv3            |                |           |           
│││││└── norm            |                |           |           
│││├── 2                 |                |           |           
││││├── conv1            |                |           |           
│││││└── norm            |                |           |           
││││├── conv2            |                |           |           
│││││└── norm            |                |           |           
││││└── conv3            |                |           |           
││││ └── norm            |                |           |           
├── proposal_generator   |                |           |           
│├── anchor_generator    |                |           |           
││└── cell_anchors       |                |           |           
│├── rpn_head            |                |           |           
││├── conv               |                |           |           
│││├── conv2d            |        5.090us |  76.560us |  148.480us
│││├── convolution       |        4.890us |  71.470us |  143.360us
│││├── _convolution      |       10.410us |  66.580us |  140.288us
│││├── contiguous        |        2.730us |   2.730us |    2.912us
│││└── cudnn_convolution |       53.440us |  53.440us |  128.000us
││├── objectness_logits  |                |           |           
││└── anchor_deltas      |                |           |           
└── roi_heads            |                |           |           
 ├── box_pooler          |                |           |           
 │├── level_poolers      |                |           |           
 ││├── 0                 |                |           |           
 ││├── 1                 |                |           |           
 ││├── 2                 |                |           |           
 ││└── 3                 |                |           |           
 ├── box_head            |                |           |           
 │├── fc1                |                |           |           
 │└── fc2                |                |           |           
 └── box_predictor       |                |           |           
  ├── cls_score          |                |           |           
  └── bbox_pred          |                |           |           

---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Name                   Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes                         
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
conv2d                 6.40%            5.290us          100.00%          82.600us         82.600us         25.05%           4.041ms          4.041ms          1                []                                   
convolution            6.57%            5.430us          93.60%           77.310us         77.310us         25.03%           4.037ms          4.037ms          1                []                                   
_convolution           14.99%           12.380us         87.02%           71.880us         71.880us         25.00%           4.031ms          4.031ms          1                []                                   
contiguous             3.75%            3.100us          3.75%            3.100us          3.100us          0.02%            3.072us          3.072us          1                []                                   
cudnn_convolution      68.28%           56.400us         68.28%           56.400us         56.400us         24.91%           4.017ms          4.017ms          1                []                                   
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Self CPU time total: 82.600us
CUDA time total: 16.129ms

---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Name                   Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes                         
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
conv2d                 6.65%            5.150us          100.00%          77.440us         77.440us         25.17%           1.166ms          1.166ms          1                []                                   
convolution            6.55%            5.070us          93.35%           72.290us         72.290us         25.08%           1.162ms          1.162ms          1                []                                   
_convolution           14.24%           11.030us         86.80%           67.220us         67.220us         24.97%           1.157ms          1.157ms          1                []                                   
contiguous             3.56%            2.760us          3.56%            2.760us          2.760us          0.07%            3.072us          3.072us          1                []                                   
cudnn_convolution      69.00%           53.430us         69.00%           53.430us         53.430us         24.71%           1.145ms          1.145ms          1                []                                   
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Self CPU time total: 77.440us
CUDA time total: 4.633ms

---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Name                   Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes                         
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
conv2d                 6.93%            5.360us          100.00%          77.340us         77.340us         25.47%           414.720us        414.720us        1                []                                   
convolution            6.88%            5.320us          93.07%           71.980us         71.980us         25.22%           410.624us        410.624us        1                []                                   
_convolution           14.61%           11.300us         86.19%           66.660us         66.660us         24.96%           406.336us        406.336us        1                []                                   
contiguous             3.71%            2.870us          3.71%            2.870us          2.870us          0.19%            3.072us          3.072us          1                []                                   
cudnn_convolution      67.87%           52.490us         67.87%           52.490us         52.490us         24.15%           393.216us        393.216us        1                []                                   
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Self CPU time total: 77.340us
CUDA time total: 1.628ms

---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Name                   Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes                         
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
conv2d                 6.76%            5.250us          100.00%          77.710us         77.710us         25.88%           211.968us        211.968us        1                []                                   
convolution            6.16%            4.790us          93.24%           72.460us         72.460us         25.38%           207.872us        207.872us        1                []                                   
_convolution           14.18%           11.020us         87.08%           67.670us         67.670us         24.88%           203.776us        203.776us        1                []                                   
contiguous             3.47%            2.700us          3.47%            2.700us          2.700us          0.36%            2.976us          2.976us          1                []                                   
cudnn_convolution      69.42%           53.950us         69.42%           53.950us         53.950us         23.49%           192.352us        192.352us        1                []                                   
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Self CPU time total: 77.710us
CUDA time total: 818.944us

---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Name                   Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes                         
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
conv2d                 6.65%            5.090us          100.00%          76.560us         76.560us         26.37%           148.480us        148.480us        1                []                                   
convolution            6.39%            4.890us          93.35%           71.470us         71.470us         25.46%           143.360us        143.360us        1                []                                   
_convolution           13.60%           10.410us         86.96%           66.580us         66.580us         24.92%           140.288us        140.288us        1                []                                   
contiguous             3.57%            2.730us          3.57%            2.730us          2.730us          0.52%            2.912us          2.912us          1                []                                   
cudnn_convolution      69.80%           53.440us         69.80%           53.440us         53.440us         22.73%           128.000us        128.000us        1                []                                   
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------  
Self CPU time total: 76.560us
CUDA time total: 563.040us
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment