Skip to content

Instantly share code, notes, and snippets.

View FareedKhan-dev's full-sized avatar

Fareed Khan FareedKhan-dev

View GitHub Profile
Format Meaning
W4A16 Weights in 4-bit, Activations in 16-bit
W8A8 Weights and Activations both in 8-bit
W4A8 Weights in 4-bit, Activations in 8-bit
W8A16 Weights in 8-bit, Activations in 16-bit
@FareedKhan-dev
FareedKhan-dev / table.md
Last active June 30, 2025 18:45
eval.md
# Question Answer Truth Corr Faith Rel Recall Sim
0 What is the name of... Fluffy Fluffy 1.00 1.00 1.00 1.00 1.00
1 Who gave Harry Potter his... Professor McGonagall Professor McGonagall 1.00 1.00 0.95 1.00 1.00
2 Which house did the Sorting... Slytherin Slytherin 1.00 0.00 0.89 0.00 1.00
Aspect Description Key Points
Model and Data Parallelism Splitting the model or data across processors for parallel work - Choose how to split (by layers or data)
- Keep communication between processors low
- Sync tasks to handle dependencies
Batch Processing Techniques Methods to handle data in batches for faster training - Use dynamic batch sizes suited to hardware
Technique Description Advantages Disadvantages Best for Scenario
Model Parallelism Splits model layers across different devices Lets you train very large models across multiple GPUs Slower due to communication between devices When the model is too big for one device’s memory
Data Parallelism Sends different data batches to different devices Simple to use and scales well with tools like PyTorch Syncing gradients can ca
Optimizer Advantages Ideal for Scenario
Adam Combines the strengths of AdaGrad and RMSProp with smart learning rates Works well in most cases, especially with large and complex data
RMSprop Fixes AdaGrad’s issue of shrinking learning rates Good for online learning and changing data
Adagrad Changes learning rates for each parameter, great with sparse data Useful when data is sparse or features vary a lot
Nadam Adds Nesterov momentum to Adam for faster learning When you want faster training than Adam
Adadelta Improves Adagrad by keeping learning rates from getting too small Great for tu
# Metric Raw Approach Mem0 Approach Percentage Difference (%)
0 Prompt Tokens 7616 5037 33.86
1 Completion Tokens 1372 410 70.12
2 Total Tokens 8988 5447 39.40
# Metric Raw Approach Mem0 Approach Percentage Difference (%)
0 Prompt Tokens 7616 5037 33.86
1 Completion Tokens 1372 410 70.12
2 Total Tokens 8988 5447 39.40
3 NaN
4 Mem0: Conversational Prompt - 788 NaN
5 Mem0: Conversational Completion - 98 NaN
6 Mem0: Extraction Prompt - 1453 NaN
7 Mem0: Extraction Completion - 168 NaN
Type Description Example
Factual Outputs are incorrect or made up "Marie Curie discovered penicillin in 1928." (The discovery was actually made by Alexander Fleming in 1928.)
Temporal Stale or outdated knowledge shown as current "The current US president is Barack Obama." (Outdated knowledge; Joe Biden has been president since 2021.)
Contextual Adds concepts that weren’t mentioned or implied Summarizing a project report and adding "the team planned a surprise party," even though the original report never mentioned any par
# chunk overlap top_k strategy avg faith rel sim time_s
0 200 40 4 Simple 0.902 0.91 0.98 0.802 8.2
1 300 60 3 Rewrite 0.898 0.89 0.97 0.795 7.0
2 180 35 5 Rerank 0.896 0.88 0.96 0.788 10.5
3 220 45 4 Rewrite 0.894 0.90 0.95 0.785 9.1
4 180 25 3 Simple 0.892 0.87 0.94 0.779 7.9
5 280 50 3 Rewrite 0.890 0.88 0.95 0.776 8.6
6 300 50 5 Rerank 0.888 0.86 0.93 0.774 42.5
7 250 30 4 Simple 0.886 0.87 0.96 0.770 8.0
# chunk_size overlap top_k strategy avg_score faithfulness relevancy similarity_score time_sec answer (summary)
0 200 40 4 Simple RAG 0.902 0.91 0.98 0.802 8.2 Solar and hydropower differ...
1 300 60 3 Query Rewrite RAG 0.898 0.89 0.97 0.795 7.0 Hydropower is more reliable...
2 180 35 5 Rerank RAG (Simulated) 0.896 0.88 0.96 0.788 10.5 Hydropower provides 24/7 power...
3 220 45 4 Query Rewrite RAG 0.894 0.90 0.95 0.785 9.1 Hydropower depends less on weather...
4