You are given an array of strings only (could be words, phrases etc). Create a function to find all the anagrams within that array. The output should be an array where each element in the array is itself an array of anagrams.
const words = ['cat', 'act', 'ignore', 'a phrase', 'tape', 'pate', 'e hpsara', 'tac'];
listAnagrams(words);
// [['cat', 'act', 'tac'], ['a phrase', 'e hpsara'], ['tape', 'pate']]
Notice that 'ignore'
does not appear in the output.
One effective approach is:
- Sort each string in the array
- Create a hash table of these sorted strings and arrays of words that match
- Go through all of the values in that hash table and add them to the output array if they contain at least two elements
function listAnagrams (wordsArr) {
const wordsTable = {};
wordsArr.forEach(word => { // time: O(N)
// in order to sort a string we must convert it into an array of
// its characters
const wordKey = word.split('').sort().join(''); // time: O(M * log(M))
// if this sorted entry already exists push the word into the array
// with its sibling anagrams
if (wordsTable[wordKey]) { // time: O(1)
wordsTable[wordKey].push(word);
}
// or if we have not yet visited any anagrams of this word, create
// a new array for it
else wordsTable[wordKey] = [word];
});
const output = [];
Object.keys(wordsTable).forEach(wordKey => {
const anagrams = wordsTable[wordKey];
// only include lists with more than one anagram
if (anagrams.length > 1) {
output.push(anagrams);
}
});
return output;
}
"Minified" solution:
const anagramDetector = strArr => Object.values(
strArr.reduce((hash, word) => {
const wordSort = word.split('').sort().join('');
if (hash[wordSort]) hash[wordSort].push(word)
else hash[wordSort] = [word];
return hash;
}, {})).filter(anagrams => anagrams.length > 1)
If there are N words and M is the maximum length of a word, the upper bound for time complexity will be O(N * M * log(M)), at least in V8 (Chrome or Node). Why? The time complexity of .sort() in V8 is M * log(M) in the worst case, as it is a quicksort. Sorting N words takes N * M * log(M) time. This will always be larger than the TC of the run through the hashmap (or the filter call in the minified solution), so we don't need to consider this second part when thinking about the upper bound.