Skip to content

Instantly share code, notes, and snippets.

@asw001
Created May 3, 2017 03:03
Show Gist options
  • Save asw001/de7a17167c46109b216e4ffbac199f07 to your computer and use it in GitHub Desktop.
Save asw001/de7a17167c46109b216e4ffbac199f07 to your computer and use it in GitHub Desktop.
Thinkful word frequency script analysis
// the Boolean filter removes any items that considered false: NaN, false, 0, undefined
// this takes the 'unmassaged' data in the string -- this may be a paragraph, page, etc
// and 'tokenizes' the string, splitting on a single white space and common punctuation
// the function also remove variation in the string tokens, i.e. words, by converting them all to lowercase
// an array is returned sorted; at this point we can assume duplicate values in the array
// getting the array
// wordFrequencies is an object (associative array)
// in the for loop, we iterate over the words array
// if a word in the words array is in the wordFrequencies object, increment the value of
// wordFrequencies[<word-as-key] by 1
// if not, the value for the key (word) is 1
// outside of the for-block, after the word count has fully accumulated for the processes rawString
// the variable currentMaxKey is set to the first key in the wordFrequencies
// object. currentMaxCount is set to the value of key currentMaxKey in wordFrequencies
// the object is iterated over, and the value of currentMaxCount is compared to current iterated value;
// if that iterated value is greater than currentMaxCount, the iterated value becomes currentMaxCount;
// currentMaxKey is set to current iteration of word
// after the object is iterated over, currentMaxKey, whose value is the most frequent word, is returned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment