Skip to content

Instantly share code, notes, and snippets.

@goldingn
Created October 29, 2014 17:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save goldingn/80c82f2886debeb927a5 to your computer and use it in GitHub Desktop.
Save goldingn/80c82f2886debeb927a5 to your computer and use it in GitHub Desktop.
no frills fuzzy matching for character vectors in R
fuzzyMatch <- function (a, b) {
# no-frills fuzzy matching of strings between character vectors
# `a` and `b` (essentially a wrapper around a stringdist function)
# The function returns a two column matrix giving the matching index
# (as `match` would return) and a matrix giving the distances, so you
# can check how well it did on the hardest words.
# Warning - this uses all of your cores.
# load the stringdist package
require (stringdist)
# calculate a jaccard dissimilarity matrix
distance <- stringdistmatrix(a,
b,
method = 'jaccard',
ncores = parallel:::detectCores())
# find the closest match for each
match <- apply(distance, 1, which.min)
# find how far away these were
dists <- apply(distance, 1, min)
# return these as a two-column matrix
return (cbind(match = match,
distance = dists))
}
@mister-frostee
Copy link

Should be nthread and not ncores

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment