Skip to content

Instantly share code, notes, and snippets.

@mzbac
Created February 25, 2024 04:19
Show Gist options
  • Save mzbac/00ebe60bb36fa4d8f65509f8e47350d5 to your computer and use it in GitHub Desktop.
Save mzbac/00ebe60bb36fa4d8f65509f8e47350d5 to your computer and use it in GitHub Desktop.
llava implementation
```sh
Initial Setup:
+-------------------+ +---------------+
| Text Sequence | | Raw Images |
| [T1, <IMG>, T2, | | [Image1, |
| T3, <IMG>, T4] | | Image2] |
+-------------------+ +---------------+
Step 1: Convert Text and <IMG> Tokens to Embeddings
+---------------------------------------------------------+
| Text and <IMG> Token Embedding Model |
| |
| [T1, <IMG>, T2, T3, <IMG>, T4] |
| | |
| V |
| [T1_emb, IMG_emb, T2_emb, T3_emb, IMG_emb, T4_emb] |
+---------------------------------------------------------+
Step 2: Convert Images to Feature Patches Using Vision Model
+------------------------------------------------------------+
| Vision Model |
| |
| Image1 ---> [I1_1, I1_2, I1_3] |
| Image2 ---> [I2_1, I2_2, I2_3] |
+------------------------------------------------------------+
Step 3: Convert Image Patches to Embeddings
+------------------------------------------------------------+
| Image Patch Embedding Conversion |
| |
| [I1_1, I1_2, I1_3] ---> [I1_1_embed, I1_2_embed, I1_3_embed]|
| [I2_1, I2_2, I2_3] ---> [I2_1_embed, I2_2_embed, I2_3_embed]|
+------------------------------------------------------------+
Step 4: Replace IMG_emb with Image Patch Embeddings in Sequence
+------------------------------------------------------------+
| Updated Sequence Embeddings |
| |
| [T1_emb, I1_1_embed, I1_2_embed, I1_3_embed, T2_emb, |
| T3_emb, I2_1_embed, I2_2_embed, I2_3_embed, T4_emb] |
+------------------------------------------------------------+
Step 5: Feed the Updated Sequence into the LLM
+------------------------------------------------------------+
| Large Language Model |
| |
| Input: [T1_emb, I1_1_embed, I1_2_embed, I1_3_embed, |
| T2_emb, T3_emb, I2_1_embed, I2_2_embed, I2_3_embed,|
| T4_emb] |
| |
| | |
| V |
| Output: Model Predictions |
+------------------------------------------------------------+
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment