Skip to content

Instantly share code, notes, and snippets.

@dreamerns
Last active July 7, 2020 22:27
Show Gist options
  • Save dreamerns/875eb48aef58aeb194361b6a3ec47b90 to your computer and use it in GitHub Desktop.
Save dreamerns/875eb48aef58aeb194361b6a3ec47b90 to your computer and use it in GitHub Desktop.
TF-IDF in Java
javac TFIDF.java
java TFIDF dog cat
// search returns array of matching docs ranked by tf-idf score
// aka term frequency * inverse document frequency
//
// tf = # of occurences of term in document / # of words in document
// idf = log ( # of documents / # of documents with term )
// tf-idf = tf * idf
// multi-term tf-idf = sum of tf-idf scores (per document)
import java.util.Arrays;
import java.util.List;
public class TFIDF {
public List<String> search(List<String> docs, String[] terms) {
return docs;
}
public static void main(String[] args) {
List<String> docs = Arrays.asList(
"the quick brown fox jumps over the lazy dog",
"i like to eat beans for dinner",
"dogs are a man's best friend. we like dogs",
"cats are the biggest trolls",
"the dog and the cat don't get along",
"do cats like hamburgers? let's test and find out",
"the rabbit likes a carrot in her stew");
TFIDF tfidf = new TFIDF();
List<String> result = tfidf.search(docs, args);
result.stream().forEach(doc -> System.out.println(doc));
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment