Skip to content

Instantly share code, notes, and snippets.

@joshbode
Created August 26, 2012 08:46
Show Gist options
  • Save joshbode/3476326 to your computer and use it in GitHub Desktop.
Save joshbode/3476326 to your computer and use it in GitHub Desktop.
Find number of matching words in phrases.
# match on intersecting word set
matching_words = function(x, y) {
x = unlist(strsplit(x, ' ', fixed=TRUE))
y = strsplit(y, ' ', fixed=TRUE)
# get intersections
results = sapply(y, function(b) { length(intersect(x, b)) })
# break ties in intersection counts based on simplicity (length) of original phrase
results_freq = table(results)
results_freq = results_freq[names(results_freq) > 0]
y_length = sapply(y, length)
for (f in names(results_freq)) {
mask = which(results == f)
results[mask] = 1.0 - (rank(y_length[mask]) - 1.0) / results_freq[f]
}
return(results)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment