Skip to content

Instantly share code, notes, and snippets.

@jhamberg
Last active March 11, 2019 14:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jhamberg/0de5e2cbbc60920107277d49b8873834 to your computer and use it in GitHub Desktop.
Save jhamberg/0de5e2cbbc60920107277d49b8873834 to your computer and use it in GitHub Desktop.
Calculate the Jaccard Distance between two strings
import java.util.HashMap;
public class JaccardDistance {
public static double jaccard(String a, String b) {
HashMap<String, Integer> map = new HashMap<>();
// For each bigram in first string, add 1
for(int i=0; i < a.length()-1; i++) {
String bigram = "" + a.charAt(i) + a.charAt(i+1);
map.put(bigram, map.getOrDefault(bigram, 0) + 1);
}
// For each bigram in second string, reduce 1
for(int i=0; i < b.length()-1; i++) {
String bigram = "" + b.charAt(i) + b.charAt(i+1);
map.put(bigram, map.getOrDefault(bigram, 0) - 1);
}
// Calculate elements in intersection
int intersection = 0;
int missingFromA = 0;
for(int value : map.values()) {
intersection += Math.abs(value);
// Negative values are missing from A
if (value < 0) {
missingFromA += Math.abs(value);
}
}
// Calculate elements in union
int bigramsInA = a.length() - 1;
int union = bigramsInA + missingFromA;
return intersection/(double)union;
}
public static void main(String[] args) {
System.out.println(jaccard(
"The quick brown fox jumps over the lazy dog",
"The quick brown fox jumps over the hazy dog"));
// Result: 0.09090909090909091
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment