| Format | Meaning |
|---|---|
| W4A16 | Weights in 4-bit, Activations in 16-bit |
| W8A8 | Weights and Activations both in 8-bit |
| W4A8 | Weights in 4-bit, Activations in 8-bit |
| W8A16 | Weights in 8-bit, Activations in 16-bit |
| # | Question | Answer | Truth | Corr | Faith | Rel | Recall | Sim |
|---|---|---|---|---|---|---|---|---|
| 0 | What is the name of... | Fluffy | Fluffy | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 1 | Who gave Harry Potter his... | Professor McGonagall | Professor McGonagall | 1.00 | 1.00 | 0.95 | 1.00 | 1.00 |
| 2 | Which house did the Sorting... | Slytherin | Slytherin | 1.00 | 0.00 | 0.89 | 0.00 | 1.00 |
| Aspect | Description | Key Points |
|---|---|---|
| Model and Data Parallelism | Splitting the model or data across processors for parallel work | - Choose how to split (by layers or data) |
| - Keep communication between processors low | ||
| - Sync tasks to handle dependencies | ||
| Batch Processing Techniques | Methods to handle data in batches for faster training | - Use dynamic batch sizes suited to hardware |
| Technique | Description | Advantages | Disadvantages | Best for Scenario |
|---|---|---|---|---|
| Model Parallelism | Splits model layers across different devices | Lets you train very large models across multiple GPUs | Slower due to communication between devices | When the model is too big for one device’s memory |
| Data Parallelism | Sends different data batches to different devices | Simple to use and scales well with tools like PyTorch | Syncing gradients can ca |
| Optimizer | Advantages | Ideal for Scenario |
|---|---|---|
| Adam | Combines the strengths of AdaGrad and RMSProp with smart learning rates | Works well in most cases, especially with large and complex data |
| RMSprop | Fixes AdaGrad’s issue of shrinking learning rates | Good for online learning and changing data |
| Adagrad | Changes learning rates for each parameter, great with sparse data | Useful when data is sparse or features vary a lot |
| Nadam | Adds Nesterov momentum to Adam for faster learning | When you want faster training than Adam |
| Adadelta | Improves Adagrad by keeping learning rates from getting too small | Great for tu |
| # | Metric | Raw Approach | Mem0 Approach | Percentage Difference (%) |
|---|---|---|---|---|
| 0 | Prompt Tokens | 7616 | 5037 | 33.86 |
| 1 | Completion Tokens | 1372 | 410 | 70.12 |
| 2 | Total Tokens | 8988 | 5447 | 39.40 |
| # | Metric | Raw Approach | Mem0 Approach | Percentage Difference (%) |
|---|---|---|---|---|
| 0 | Prompt Tokens | 7616 | 5037 | 33.86 |
| 1 | Completion Tokens | 1372 | 410 | 70.12 |
| 2 | Total Tokens | 8988 | 5447 | 39.40 |
| 3 | NaN | |||
| 4 | Mem0: Conversational Prompt | - | 788 | NaN |
| 5 | Mem0: Conversational Completion | - | 98 | NaN |
| 6 | Mem0: Extraction Prompt | - | 1453 | NaN |
| 7 | Mem0: Extraction Completion | - | 168 | NaN |
| Type | Description | Example |
|---|---|---|
| Factual | Outputs are incorrect or made up | "Marie Curie discovered penicillin in 1928." (The discovery was actually made by Alexander Fleming in 1928.) |
| Temporal | Stale or outdated knowledge shown as current | "The current US president is Barack Obama." (Outdated knowledge; Joe Biden has been president since 2021.) |
| Contextual | Adds concepts that weren’t mentioned or implied | Summarizing a project report and adding "the team planned a surprise party," even though the original report never mentioned any par |
| # | chunk | overlap | top_k | strategy | avg | faith | rel | sim | time_s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 200 | 40 | 4 | Simple | 0.902 | 0.91 | 0.98 | 0.802 | 8.2 |
| 1 | 300 | 60 | 3 | Rewrite | 0.898 | 0.89 | 0.97 | 0.795 | 7.0 |
| 2 | 180 | 35 | 5 | Rerank | 0.896 | 0.88 | 0.96 | 0.788 | 10.5 |
| 3 | 220 | 45 | 4 | Rewrite | 0.894 | 0.90 | 0.95 | 0.785 | 9.1 |
| 4 | 180 | 25 | 3 | Simple | 0.892 | 0.87 | 0.94 | 0.779 | 7.9 |
| 5 | 280 | 50 | 3 | Rewrite | 0.890 | 0.88 | 0.95 | 0.776 | 8.6 |
| 6 | 300 | 50 | 5 | Rerank | 0.888 | 0.86 | 0.93 | 0.774 | 42.5 |
| 7 | 250 | 30 | 4 | Simple | 0.886 | 0.87 | 0.96 | 0.770 | 8.0 |
| # | chunk_size | overlap | top_k | strategy | avg_score | faithfulness | relevancy | similarity_score | time_sec | answer (summary) |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 200 | 40 | 4 | Simple RAG | 0.902 | 0.91 | 0.98 | 0.802 | 8.2 | Solar and hydropower differ... |
| 1 | 300 | 60 | 3 | Query Rewrite RAG | 0.898 | 0.89 | 0.97 | 0.795 | 7.0 | Hydropower is more reliable... |
| 2 | 180 | 35 | 5 | Rerank RAG (Simulated) | 0.896 | 0.88 | 0.96 | 0.788 | 10.5 | Hydropower provides 24/7 power... |
| 3 | 220 | 45 | 4 | Query Rewrite RAG | 0.894 | 0.90 | 0.95 | 0.785 | 9.1 | Hydropower depends less on weather... |
| 4 |