Skip to content

Instantly share code, notes, and snippets.

@airMeng
Created October 30, 2023 07:02
Show Gist options
  • Save airMeng/e595640769b4cb970eea9dacc22e3f3d to your computer and use it in GitHub Desktop.
Save airMeng/e595640769b4cb970eea9dacc22e3f3d to your computer and use it in GitHub Desktop.
Add model support.md

1. Model weight conversion

1.1 Pytorch weight parsing

1.2 tokenizer

2. Model enablements

2.1 model loading

model class

model struct (ffn, attn, norm tensors), tensor name mapping

model_context

model_load_internal (explaination of variables)

2.2 inference process

model_eval_internal (explaination of variables)

2.3 application

CMakeLists.txt

Python bindings and scripts

3. Performance optimization

3.0 introduction to JBLAS, isa

https://github.com/intel-sandbox/jblas/blob/main/README.md

3.1 Int4 quantize (sym, bits, compute type, fallback)

3.2 MHA fusion

basic MHA (check jblas_mha_xxx_support)

GQA

link-to-MHA-doc

3.3 FFN fusion

basic FFN

link-to-FFN-doc

3.4 Tensor Parallel

link to TP doc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment