Fareed Khan FareedKhan-dev

## we_only.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                FareedKhan-dev
                / we_only.md
            
            
              Created
              July 3, 2025 12:40
            
          
Format
Meaning


W4A16
Weights in 4-bit, Activations in 16-bit


W8A8
Weights and Activations both in 8-bit


W4A8
Weights in 4-bit, Activations in 8-bit


W8A16
Weights in 8-bit, Activations in 16-bit


## table.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                FareedKhan-dev
                / table.md
            
            
              Last active
              June 30, 2025 18:45
            
              
                eval.md
              
          
#
Question
Answer
Truth
Corr
Faith
Rel
Recall
Sim


0
What is the name of...
Fluffy
Fluffy
1.00
1.00
1.00
1.00
1.00


1
Who gave Harry Potter his...
Professor McGonagall
Professor McGonagall
1.00
1.00
0.95
1.00
1.00


2
Which house did the Sorting...
Slytherin
Slytherin
1.00
0.00
0.89
0.00
1.00


## comparison.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                FareedKhan-dev
                / comparison.md
            
            
              Created
              June 4, 2025 07:10
            
          
Aspect
Description
Key Points


Model and Data Parallelism
Splitting the model or data across processors for parallel work
- Choose how to split (by layers or data)


- Keep communication between processors low


- Sync tasks to handle dependencies


Batch Processing Techniques
Methods to handle data in batches for faster training
- Use dynamic batch sizes suited to hardware


## comparison.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                FareedKhan-dev
                / comparison.md
            
            
              Created
              June 4, 2025 06:28
            
          
Technique
Description
Advantages
Disadvantages
Best for Scenario


Model Parallelism
Splits model layers across different devices
Lets you train very large models across multiple GPUs
Slower due to communication between devices
When the model is too big for one device’s memory


Data Parallelism
Sends different data batches to different devices
Simple to use and scales well with tools like PyTorch
Syncing gradients can ca


## optim.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                FareedKhan-dev
                / optim.md
            
            
              Created
              June 4, 2025 05:47
            
          
Optimizer
Advantages
Ideal for Scenario


Adam
Combines the strengths of AdaGrad and RMSProp with smart learning rates
Works well in most cases, especially with large and complex data


RMSprop
Fixes AdaGrad’s issue of shrinking learning rates
Good for online learning and changing data


Adagrad
Changes learning rates for each parameter, great with sparse data
Useful when data is sparse or features vary a lot


Nadam
Adds Nesterov momentum to Adam for faster learning
When you want faster training than Adam


Adadelta
Improves Adagrad by keeping learning rates from getting too small
Great for tu


## small_com.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                FareedKhan-dev
                / small_com.md
            
            
              Created
              May 18, 2025 18:31
            
              
                s
              
          
#
Metric
Raw Approach
Mem0 Approach
Percentage Difference (%)


0
Prompt Tokens
7616
5037
33.86


1
Completion Tokens
1372
410
70.12


2
Total Tokens
8988
5447
39.40


## comparison.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                FareedKhan-dev
                / comparison.md
            
            
              Last active
              May 18, 2025 13:13
            
          
#
Metric
Raw Approach
Mem0 Approach
Percentage Difference (%)


0
Prompt Tokens
7616
5037
33.86


1
Completion Tokens
1372
410
70.12


2
Total Tokens
8988
5447
39.40


3


NaN


4
Mem0: Conversational Prompt
-
788
NaN


5
Mem0: Conversational Completion
-
98
NaN


6
Mem0: Extraction Prompt
-
1453
NaN


7
Mem0: Extraction Completion
-
168
NaN


## hallucination_types.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                FareedKhan-dev
                / hallucination_types.md
            
            
              Created
              May 2, 2025 12:24
            
          
Type
Description
Example


Factual
Outputs are incorrect or made up
"Marie Curie discovered penicillin in 1928." (The discovery was actually made by Alexander Fleming in 1928.)


Temporal
Stale or outdated knowledge shown as current
"The current US president is Barack Obama." (Outdated knowledge; Joe Biden has been president since 2021.)


Contextual
Adds concepts that weren’t mentioned or implied
Summarizing a project report and adding "the team planned a surprise party," even though the original report never mentioned any par


## rag_analysis.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                FareedKhan-dev
                / rag_analysis.md
            
            
              Created
              April 26, 2025 17:50
            
          
#
chunk
overlap
top_k
strategy
avg
faith
rel
sim
time_s


0
200
40
4
Simple
0.902
0.91
0.98
0.802
8.2


1
300
60
3
Rewrite
0.898
0.89
0.97
0.795
7.0


2
180
35
5
Rerank
0.896
0.88
0.96
0.788
10.5


3
220
45
4
Rewrite
0.894
0.90
0.95
0.785
9.1


4
180
25
3
Simple
0.892
0.87
0.94
0.779
7.9


5
280
50
3
Rewrite
0.890
0.88
0.95
0.776
8.6


6
300
50
5
Rerank
0.888
0.86
0.93
0.774
42.5


7
250
30
4
Simple
0.886
0.87
0.96
0.770
8.0


## analysis.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                FareedKhan-dev
                / analysis.md
            
            
              Created
              April 26, 2025 17:48
            
          
#
chunk_size
overlap
top_k
strategy
avg_score
faithfulness
relevancy
similarity_score
time_sec
answer (summary)


0
200
40
4
Simple RAG
0.902
0.91
0.98
0.802
8.2
Solar and hydropower differ...


1
300
60
3
Query Rewrite RAG
0.898
0.89
0.97
0.795
7.0
Hydropower is more reliable...


2
180
35
5
Rerank RAG (Simulated)
0.896
0.88
0.96
0.788
10.5
Hydropower provides 24/7 power...


3
220
45
4
Query Rewrite RAG
0.894
0.90
0.95
0.785
9.1
Hydropower depends less on weather...


4
Format	Meaning
W4A16	Weights in 4-bit, Activations in 16-bit
W8A8	Weights and Activations both in 8-bit
W4A8	Weights in 4-bit, Activations in 8-bit
W8A16	Weights in 8-bit, Activations in 16-bit
#	Question	Answer	Truth	Corr	Faith	Rel	Recall	Sim
0	What is the name of...	Fluffy	Fluffy	1.00	1.00	1.00	1.00	1.00
1	Who gave Harry Potter his...	Professor McGonagall	Professor McGonagall	1.00	1.00	0.95	1.00	1.00
2	Which house did the Sorting...	Slytherin	Slytherin	1.00	0.00	0.89	0.00	1.00
Aspect	Description	Key Points
Model and Data Parallelism	Splitting the model or data across processors for parallel work	- Choose how to split (by layers or data)
		- Keep communication between processors low
		- Sync tasks to handle dependencies
Batch Processing Techniques	Methods to handle data in batches for faster training	- Use dynamic batch sizes suited to hardware
Technique	Description	Advantages	Disadvantages	Best for Scenario
Model Parallelism	Splits model layers across different devices	Lets you train very large models across multiple GPUs	Slower due to communication between devices	When the model is too big for one device’s memory
Data Parallelism	Sends different data batches to different devices	Simple to use and scales well with tools like PyTorch	Syncing gradients can ca
Optimizer	Advantages	Ideal for Scenario
Adam	Combines the strengths of AdaGrad and RMSProp with smart learning rates	Works well in most cases, especially with large and complex data
RMSprop	Fixes AdaGrad’s issue of shrinking learning rates	Good for online learning and changing data
Adagrad	Changes learning rates for each parameter, great with sparse data	Useful when data is sparse or features vary a lot
Nadam	Adds Nesterov momentum to Adam for faster learning	When you want faster training than Adam
Adadelta	Improves Adagrad by keeping learning rates from getting too small	Great for tu
#	Metric	Raw Approach	Mem0 Approach	Percentage Difference (%)
0	Prompt Tokens	7616	5037	33.86
1	Completion Tokens	1372	410	70.12
2	Total Tokens	8988	5447	39.40
Type	Description	Example
Factual	Outputs are incorrect or made up	"Marie Curie discovered penicillin in 1928." (The discovery was actually made by Alexander Fleming in 1928.)
Temporal	Stale or outdated knowledge shown as current	"The current US president is Barack Obama." (Outdated knowledge; Joe Biden has been president since 2021.)
Contextual	Adds concepts that weren’t mentioned or implied	Summarizing a project report and adding "the team planned a surprise party," even though the original report never mentioned any par
#	chunk	overlap	top_k	strategy	avg	faith	rel	sim	time_s
0	200	40	4	Simple	0.902	0.91	0.98	0.802	8.2
1	300	60	3	Rewrite	0.898	0.89	0.97	0.795	7.0
2	180	35	5	Rerank	0.896	0.88	0.96	0.788	10.5
3	220	45	4	Rewrite	0.894	0.90	0.95	0.785	9.1
4	180	25	3	Simple	0.892	0.87	0.94	0.779	7.9
5	280	50	3	Rewrite	0.890	0.88	0.95	0.776	8.6
6	300	50	5	Rerank	0.888	0.86	0.93	0.774	42.5
7	250	30	4	Simple	0.886	0.87	0.96	0.770	8.0
#	chunk_size	overlap	top_k	strategy	avg_score	faithfulness	relevancy	similarity_score	time_sec	answer (summary)
0	200	40	4	Simple RAG	0.902	0.91	0.98	0.802	8.2	Solar and hydropower differ...
1	300	60	3	Query Rewrite RAG	0.898	0.89	0.97	0.795	7.0	Hydropower is more reliable...
2	180	35	5	Rerank RAG (Simulated)	0.896	0.88	0.96	0.788	10.5	Hydropower provides 24/7 power...
3	220	45	4	Query Rewrite RAG	0.894	0.90	0.95	0.785	9.1	Hydropower depends less on weather...
4