With torch dynamo, we can dispatch a pytorch model to other awesome deep learning framework/compilers for acceleration. Hidet is one of such deep learning compilers that accelerates your model with a bunch of optimizations (e.g., subgraph fusion, rewriting and kernel tuning). To use hidet, please first install it via
$ pip install hidet
Then you can enable it via torch.compile(model, backend='hidet')
as shown in the code snippet below:
import torch
import hidet
# Define pytorch model
model = torch.hub.load('pytorch/vision:v0.6.0', 'resnet18', pretrained=True).cuda().eval()
x = torch.rand(1, 3, 224, 224).cuda()
# Compile the model through Hidet
hidet.torch.dynamo_config.search_space(2) # tune the kernel performance
model_opt = torch.compile(model, backend='hidet')
# Run the optimized model
y = model_opt(x)
Here are some benchmarks (Batch Size = 1, NVIDIA RTX 3090, Bert sequence length=128, with float32 data type)
Learn more about hidet and its optimization options in the tutorial and GitHub repository. Hidet originates our research work that tries to simplify writing tensor program with our proposed task-mapping programming paradigm. Please checkout our paper for more details.
And though hidet script is simpler than cuda code, its complexity still will prevent ML people to use it directly. The same argument is true for triton.