Skip to content

Instantly share code, notes, and snippets.

@psychicbologna
Last active July 20, 2019 12:39
Show Gist options
  • Save psychicbologna/9f25ff00fd1ae777f16976b2e2667912 to your computer and use it in GitHub Desktop.
Save psychicbologna/9f25ff00fd1ae777f16976b2e2667912 to your computer and use it in GitHub Desktop.
A function (getTokens) is called with rawString as the argument representing the text. This function creates a sorted array of individual words based on a block of text passed through it which we can then sort more easily. The function starts with a returned chain of methods, first converting the string to lower case characters with .toLowerCase() to make sure capitalized words aren't accidentally segregated from the same words in lower case. It uses .split() with a regex, or regular expression, telling it to split the string into an array of smaller strings wherever there are spaces or punctuation. Next, .filter(Boolean) removes falsy items from the array, to make sure no empty strings make it through. Finally, it's sorted by alphabetical order using a basic .sort() and returned.
This function is called in the following function, mostFrequentWord(text), where the bulk of the work is done. The text is passed as an argument. An array variable is declared using getTokens, which will create a block variable array of words as described. let wordFrequencies is also declared as an object; this will keep track of each word and their individual frequencies as objects.
A loop is created to sort through all the words using words.length as its limit. In this loop is an if condition: it checks to see if each word in the words array already exists in wordFrequencies, counting them if they exist and adding them if they don't. It uses bracket notation and the iterator as index to pull each item from the words array to see if the word exists in wordFrequencies with the 'in' operator, which checks to see if an item is in an array. If it's already there, the function increments a counter on a numerical value associated with this word in the wordFrequencies object. Otherwise, in the else statement, it makes the word a key in the object and starts a new counter for an unfamiliar word with a initial setting of 1 as its value.
We have now stored the associated information together in one object; wordFrequencies should now be a multidimensional array containing a list of every word used as well as their frequency, hopefully without redundancies or any unwanted data types. All we need to do is pick out the most frequent word by comparing each word one more time in another loop.
Two new variables are declared: currentMaxKey uses the Object.keys method on wordFrequencies to pick out the first word with index [0], while currentMaxCount pulls the word count from this key in the object itself. We can use these as the baseline to launch our loop instead of having to declare them from within it.
Our new loop is a for/in loop, meaning it will iterate through every property in an object or array. Since the key properties of this object are now just a cleaned-up list of words, this means we don't have to define how many times it should perform the function. 'word' is our initializer variable, and wordFrequencies will be iterated upon one more time. The condition is set - if a 'word' in wordFrequencies (really the count of the word and not the word itself) is larger than the currentMaxCount established earlier, the variables are changed so that the currentMaxKey is the word itself and the currentMaxCount is wordFrequencies[word], again, the count of the word associated with the word as its key.
Once it's done iterating, all we need do is return the currentMaxKey at the end of the function to have our most frequent word.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment