Skip to content

Instantly share code, notes, and snippets.

@tejpratap46
Created August 26, 2016 06:44
Show Gist options
  • Save tejpratap46/4f9d746554133dfe591f17484dc1487f to your computer and use it in GitHub Desktop.
Save tejpratap46/4f9d746554133dfe591f17484dc1487f to your computer and use it in GitHub Desktop.
Fuzzy search algorithm

What you need is (as i know) not supported by Parse Engine. You need a Phonetic search algorithm, but you have to retrieve all records from your PFObject to perform a Phonetic match on all the names (or fetch all starting with "D" and then perform the match on those only).

The Phonetic search algorithm is called SoundEx and various implementations have been done by developers all over the world, taking into account i.e. UK abbreviations like "St. = Saint" etc.

The SoundEx algorithm is very intelligent and is working as follows:

  1. Retain the first letter of the name and drop all other occurrences of a, e, i, o, u, y, h, w.
  2. Replace consonants with digits as follows (after the first letter):
  3. b, f, p, v => 1
  4. c, g, j, k, q, s, x, z => 2
  5. d, t => 3
  6. l => 4
  7. m, n => 5
  8. r => 6
  9. If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter; also two letters with the same number separated by 'h' or 'w' are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter.
  10. Iterate the previous step until you have one letter and three numbers. If you have too few letters in your word that you can't assign three numbers, append with zeros until there are three numbers. If you have more than 3 letters, just retain the first 3 numbers. Using this algorithm, both "Robert" and "Rupert" return the same string "R163" while "Rubin" yields "R150". "Ashcraft" and "Ashcroft" both yield "A261" and not "A226" (the chars 's' and 'c' in the name would receive a single number of 2 and not 22 since an 'h' lies in between them). "Tymczak" yields "T522" not "T520" (the chars 'z' and 'k' in the name are coded as 2 twice since a vowel lies in between them). "Pfister" yields "P236" not "P123" (the first two letters have the same number and are coded once as 'P').

Search on Google for "soundex source" and you will find an algorithm you can implement in your code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment