Skip to content

Instantly share code, notes, and snippets.

@yaoyaoding
Last active February 22, 2023 02:09
Show Gist options
  • Save yaoyaoding/29356f1481397c1d4ed177374c213b42 to your computer and use it in GitHub Desktop.
Save yaoyaoding/29356f1481397c1d4ed177374c213b42 to your computer and use it in GitHub Desktop.

(Beta) Hidet: a dynamo backend focuses on inference acceleration

With torch dynamo, we can dispatch a pytorch model to other awesome deep learning framework/compilers for acceleration. Hidet is one of such deep learning compilers that accelerates your model with a bunch of optimizations (e.g., subgraph fusion, rewriting and kernel tuning). To use hidet, please first install it via

$ pip install hidet

Then you can enable it via torch.compile(model, backend='hidet') as shown in the code snippet below:

import torch
import hidet 

# Define pytorch model
model = torch.hub.load('pytorch/vision:v0.6.0', 'resnet18', pretrained=True).cuda().eval()
x = torch.rand(1, 3, 224, 224).cuda()

# Compile the model through Hidet
hidet.torch.dynamo_config.search_space(2)  # tune the kernel performance
model_opt = torch.compile(model, backend='hidet')  

# Run the optimized model
y = model_opt(x)

Here are some benchmarks image (Batch Size = 1, NVIDIA RTX 3090, Bert sequence length=128, with float32 data type)

Learn more about hidet and its optimization options in the tutorial and GitHub repository. Hidet originates our research work that tries to simplify writing tensor program with our proposed task-mapping programming paradigm. Please checkout our paper for more details.

@wangshangsam
Copy link

wangshangsam commented Feb 21, 2023

Overall, you need to explain on a high level what Hidet is and what the best attributes of Hidet are (i.e., a not-so-competitive way of saying how Hidet is different from other DL compilers). Something like "Via the (task-mapping programming paradigm)[link to the paper], Hidet enables people with elementary or intermediate CUDA expertise to effortlessly implement high-performant GPU kernels for their inference workloads."

@wangshangsam
Copy link

Add some performance comparison graphs with the reduced-overhead mode.

@wangshangsam
Copy link

Why do you need import hidet? I thought backend=hidet`` will do that under the hood

@wangshangsam
Copy link

wangshangsam commented Feb 21, 2023

and then use it like

Then you can enable it in one line via torch.compile:

Actually, I would suggest to split "how to enable it" and the code snippet example into two parts:

and then you can enable it in one line via `torch.compile(model, backend='hidet')`, as we show in the code snipped below:

\`\`\`
...
\`\`\`

@yaoyaoding
Copy link
Author

Why do you need import hidet? I thought backend=hidet`` will do that under the hood

Because we need to use

hidet.torch.dynamo_config.search_space(2)

@yaoyaoding
Copy link
Author

"Via the (task-mapping programming paradigm)[link to the paper], Hidet enables people with elementary or intermediate CUDA expertise to effortlessly implement high-performant GPU kernels for their inference workloads."

This is too technical for ML people.

@yaoyaoding
Copy link
Author

yaoyaoding commented Feb 21, 2023

And though hidet script is simpler than cuda code, its complexity still will prevent ML people to use it directly. The same argument is true for triton.

@wangshangsam
Copy link

Overall, you need to explain on a high level what Hidet is and what the best attributes of Hidet are (i.e., a not-so-competitive way of saying how Hidet is different from other DL compilers). Something like "Via the (task-mapping programming paradigm)[link to the paper], Hidet enables people with elementary or intermediate CUDA expertise to effortlessly implement high-performant GPU kernels for their inference workloads."

On this note, if you could put a quick example of how a simple kernel (that is needed by resnet18 or whatever example model you wanna use) is implemented in Hidet (preferably something a bit more complicated than element-wise kernels, but not as involved as a full-blown hmma gemm), it would be the best.

@yaoyaoding
Copy link
Author

yaoyaoding commented Feb 21, 2023

Overall, you need to explain on a high level what Hidet is and what the best attributes of Hidet are (i.e., a not-so-competitive way of saying how Hidet is different from other DL compilers). Something like "Via the (task-mapping programming paradigm)[link to the paper], Hidet enables people with elementary or intermediate CUDA expertise to effortlessly implement high-performant GPU kernels for their inference workloads."

On this note, if you could put a quick example of how a simple kernel (that is needed by resnet18 or whatever example model you wanna use) is implemented in Hidet (preferably something a bit more complicated than element-wise kernels, but not as involved as a full-blown hmma gemm), it would be the best.

In fact, I do not hope to expose too much these technical details to the audience as most of them do not need to know this to use hidet. The blog section does not surves as this purpose as well.

Another way to use hidet in torch is to use it to add custom kernel instead of writing pytorch extension. We can educate people how to write hidet script when we have more support on this part (i.e., use hidet script to implement torch.autograd.Function) and write a pytorch blog. We can do this in the future, but I do not suggest we do it now. The audiance would be ML people who want to write pytorch extension to implement their kernels. For those people, they will have interest to learn hidet script.

@wangshangsam
Copy link

"Via the (task-mapping programming paradigm)[link to the paper], Hidet enables people with elementary or intermediate CUDA expertise to effortlessly implement high-performant GPU kernels for their inference workloads."

This is too technical for ML people.

While I agree that it's too technical for ML people, I would argue that PyTorch has a wide audience that includes MLSys folks as well. From the perspective of establishing a community around Hidet, we need to attract their attention.

Maybe we can split the example into two section then. Something like:

Hidet is one of such deep learning compilers that accelerates your model with a bunch of optimizations (e.g., subgraph fusion, rewriting and kernel tuning). You can leverage the existing optimization techniques in Hidet to accelerate your inference workloads on GPU. To use Hidet, please install it ...
...

Hidet's speciality is that, via the (task-mapping programming paradigm)[link to the paper], Hidet enables people with elementary or intermediate CUDA expertise to effortlessly implement GPU kernels that can be automatically tuned to the best performance very quickly. Below is how to implement a XXX kernel in Hidet

@anurlybayev
Copy link

Remove word “another” from the title.

@wangshangsam
Copy link

"Via the (task-mapping programming paradigm)[link to the paper], Hidet enables people with elementary or intermediate CUDA expertise to effortlessly implement high-performant GPU kernels for their inference workloads."

This is too technical for ML people.

While I agree that it's too technical for ML people, I would argue that PyTorch has a wide audience that includes MLSys folks as well. From the perspective of establishing a community around Hidet, we need to attract their attention.

Maybe we can split the example into two section then. Something like:

Hidet is one of such deep learning compilers that accelerates your model with a bunch of optimizations (e.g., subgraph fusion, rewriting and kernel tuning). You can leverage the existing optimization techniques in Hidet to accelerate your inference workloads on GPU. To use Hidet, please install it ...
...

Hidet's speciality is that, via the (task-mapping programming paradigm)[link to the paper], Hidet enables people with elementary or intermediate CUDA expertise to effortlessly implement GPU kernels that can be automatically tuned to the best performance very quickly. Below is how to implement a XXX kernel in Hidet

Or forget about the "how to implement a kernel in Hidet" example. Just say "Hidet's speciality is XXX", then say "If you want to learn more about how to implement your own kernel in Hidet, please check out XXX"

@yaoyaoding
Copy link
Author

Below is how to implement a XXX kernel in Hidet

No such a kernel that is simple but not too simple to show the advantage of hidet. I was always thinking such an example when I write hidet paper, but failed.

@wangshangsam
Copy link

My point is that you can't just rely on every MLSys folk to read every paper in ASPLOS to know about Hidet (maybe they will eventually get to it in a year, but by then it's already too late for the purpose of building a community); you need a media like PyTorch's release blog to get as wider as coverage as possible.

@wangshangsam
Copy link

Like see how OpenAI is trying to get everybody (ML or otherwise) to know about Triton, even though very few people know how to write a kernel in Triton

@yaoyaoding
Copy link
Author

I am going to have dinner, will update after that. Feel free to directly edit the gist @wangshangsam @anurlybayev.

@yaoyaoding
Copy link
Author

Our current length is about 3x to 4x longer than the average length of pytorch 1.13 feature section.

@yaoyaoding
Copy link
Author

I plan to leave to PyTorch people to decide whether to put the benchmark result.

@yaoyaoding
Copy link
Author

yaoyaoding commented Feb 22, 2023

My point is that you can't just rely on every MLSys folk to read every paper in ASPLOS to know about Hidet (maybe they will eventually get to it in a year, but by then it's already too late for the purpose of building a community); you need a media like PyTorch's release blog to get as wider as coverage as possible.

That's why I also present hidet to TVMCon, and put hidet paper information at the landing page of hidet repo. Anyway, I put a sentence to menion task-mapping programming paradigm at the end. But we do not have any documentation about hidet script to refers to. Thus, as I said previously, we can create a new post in the future when the hidet script documentation is more complete and we have a proto-type to show how to extend new pytorch operators using hidet script.

@yaoyaoding
Copy link
Author

Let me know if you have any suggestion on current version @wangshangsam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment