Skip to content

Instantly share code, notes, and snippets.

@tjdaley
Last active March 4, 2017 05:24
Show Gist options
  • Save tjdaley/454d7e8f519afd271e4cdf260bc18e0f to your computer and use it in GitHub Desktop.
Save tjdaley/454d7e8f519afd271e4cdf260bc18e0f to your computer and use it in GitHub Desktop.
Javascript Soundex Implementation
/**
* Created by Tom on 7/4/2016.
* Revised for ES6 3/3/2017
*
* Algorithm, including test cases, taken from https://en.wikipedia.org/wiki/Soundex on 4 Jul 2016.
*/
function testSoundex()
{
const tests = [
{'input':'robert','output':'R163'},
{'input':'rupert','output':'R163'},
{'input':'rubin','output':'R150'},
{'input':'Ashcraft','output':'A261'},
{'input':'Ashcroft','output':'A261'},
{'input':'Tymczak','output':'T522'},
{'input':'Pfister','output':'P236'},
{'input':'robert','output':'R163'},
{'input':'robert','output':'R163'}
];
let s;
let r;
for (let i = 0; i < tests.length; i++)
{
s = soundex(tests[i].input);
r = (s==tests[i].output) ? ' PASSED' : ' **FAILED: ' + s;
console.log(tests[i].input + r);
}
}
function soundex(stringToEncode)
{
const arrayToEncode = stringToEncode.toLowerCase().split('');
const codes = {
a: '', e: '', i: '', o: '', u: '', y: '',
b: 1, f: 1, p: 1, v: 1,
c: 2, g: 2, j: 2, k: 2, q: 2, s: 2, x: 2, z: 2,
d: 3, t: 3,
l: 4,
m: 5, n: 5,
r: 6
};
//1. Save first letter and remove all occurrences of 'h' and 'w'
const firstLetter = arrayToEncode[0];
const encodedArray = arrayToEncode.filter(function(v)
{
return (v !== 'h' && v !== 'w');
})
//2. Replace all consonants with digits
.map(function(v) {return codes[v];})
//3. Replace all adjacent same digits with one digit
.filter(function(v, i, a)
{
if (i === 0)
return true;
else
return v !== a[i - 1];
})
//4. Remove all occurrences of a,e,i,o,u,y except first letter
.filter(function(v, i)
{
return ((v !== '') || (i === 0));
});
//5. If first symbol is a digit, replace with firstLetter from step 1
encodedArray[0] = firstLetter;
//6. Append 3 zeros. Remove all but first letter and 3 digits after it
return (encodedArray.join('') + '000').slice(0, 4).toUpperCase();
}
@tjdaley
Copy link
Author

tjdaley commented Jul 4, 2016

Soundex

This is the Soundex algorithm documented on Wikipedia. It is the simplest algorithm for phonetic searching. As you can see, it is super easy to implement. What you can't see is that it is not very robust, particularly for words that are not proper English names. For example, it does not handle multi-byte strings or Eastern European names.

A Better Phonetic Search

There are algorithms that might be more suitable to your purpose. One that I am studying now is Beider-Morse Phonetic Matching. It is much more complicated than Soundex, but the authors claim amazing accuracy. Metaphone3 is also supposed to be a more robust encoding that, like Beider-Morse, dramatically reduces the myriad false positives that Soundex can generate.

Implementations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment