Skip to content

Instantly share code, notes, and snippets.

View eqy's full-sized avatar
💭
damn that's crazy

eqy

💭
damn that's crazy
View GitHub Profile
@eqy
eqy / conv.py
Created December 28, 2022 04:30
cuDNN depthwise conv example
import torch
import time
torch.backends.cudnn.benchmark = True
iters = 10
conv = torch.nn.Conv2d(64, 64, 3, 3, groups=64, dtype=torch.half, device='cuda')
convb = torch.nn.Conv2d(64, 64, 3, 3, groups=64, dtype=torch.bfloat16, device='cuda')
data = torch.randn(16, 64, 1024, 1024, dtype=torch.half, device='cuda')
@eqy
eqy / nsight.sh
Last active December 21, 2022 02:30 — forked from mcarilli/nsight.sh
Favorite nsight systems profiling commands for Pytorch scripts
# This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting.
# https://developer.nvidia.com/nsight-systems
# https://docs.nvidia.com/nsight-systems/profiling/index.html
# My preferred nsys (command line executable used to create profiles) commands
#
# In your script, write
# torch.cuda.nvtx.range_push("region name")
# ...
@eqy
eqy / temp.sh
Last active June 4, 2019 05:16
temp_hack
#!/bin/bash
PYTHONPATH=/tvm/python:$PYTHONPATH && python3 -m tvm.exec.rpc_tracker --host 0.0.0.0 --port 9190 &
while true; do
res=$(PYTHONPATH=/tvm/python:$PYTHONPATH && python3 -m tvm.exec.query_rpc_tracker --host 0.0.0.0 --port 9190 2>&1 | grep 'Cannot connect to tracker')
if [ "$res" == "" ]; then
echo "OK..."
else
echo "RESTARTING @ " $(date)
PYTHONPATH=/tvm/python:$PYTHONPATH && python3 -m tvm.exec.rpc_tracker --host 0.0.0.0 --port 9190 &
fi
@eqy
eqy / prepare_model.py
Created April 23, 2019 00:20
prepare_model.py
import logging
import mxnet as mx
import tvm
import nnvm.frontend
import nnvm.compiler
from mxnet import gluon
from mxnet.gluon.model_zoo import vision
from tvm import relay
from tvm.contrib import ndk
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
@eqy
eqy / tune_nnvm_cuda.py
Created October 17, 2018 01:51
yolo copy paste
"""
Auto-tuning a convolutional network for NVIDIA GPU
====================================================
**Author**: `Lianmin Zheng <https://https://github.com/merrymercy>`_
Auto-tuning for specific devices and workloads is critical for getting the
best performance. This is a tutorial on how to tune a whole convolutional
network for NVIDIA GPU.
The operator implementation for NVIDIA GPU in TVM is written in template form.

Motivation

Instruction Slice Table

  • What is this used for?
  • Instructions in the IST get sent to another queue
  • Define address generating instructions
  • 128 entry 2-way set-associative least-recently-used replacement policy