Skip to content

Instantly share code, notes, and snippets.

View Bisu7's full-sized avatar

Biswajit Bisu7

View GitHub Profile
| Step | IR Concept | Purpose |
|------|-------------|----------|
| 1 | Tokenization, Stop-word Removal, Stemming | Clean and normalize input text from different apps |
| 2 | Inverted Index (Conceptual) | Organize tokens and map which app/doc contains which term |
| 3 | TF–IDF / BM25 Representation | Quantify term importance across apps |
| 4 | Cosine Similarity / Semantic Similarity | Measure textual or contextual overlap between app data |
| 5 | Precision / Recall / F-Measure | Evaluate detection performance and reliability |
| 6 | Relevance Feedback | Improve detection accuracy based on user confirmation |
| 7 | Sentence-BERT (Advanced IR Model) | Capture deep semantic similarity beyond keyword matching |
| 8 | Clustering (Optional) | Group similar apps or behaviors for analysis |