This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| | Step | IR Concept | Purpose | | |
| |------|-------------|----------| | |
| | 1 | Tokenization, Stop-word Removal, Stemming | Clean and normalize input text from different apps | | |
| | 2 | Inverted Index (Conceptual) | Organize tokens and map which app/doc contains which term | | |
| | 3 | TF–IDF / BM25 Representation | Quantify term importance across apps | | |
| | 4 | Cosine Similarity / Semantic Similarity | Measure textual or contextual overlap between app data | | |
| | 5 | Precision / Recall / F-Measure | Evaluate detection performance and reliability | | |
| | 6 | Relevance Feedback | Improve detection accuracy based on user confirmation | | |
| | 7 | Sentence-BERT (Advanced IR Model) | Capture deep semantic similarity beyond keyword matching | | |
| | 8 | Clustering (Optional) | Group similar apps or behaviors for analysis | |