Skip to content

Instantly share code, notes, and snippets.

@zashishz
Created March 31, 2017 10:37
Show Gist options
  • Save zashishz/d9e9cd2347529c8ea7845e6f5980ffad to your computer and use it in GitHub Desktop.
Save zashishz/d9e9cd2347529c8ea7845e6f5980ffad to your computer and use it in GitHub Desktop.
* Tokenisation - Break down word by word (including/ excluding punctuations).
* Stop word Removal - like remove had a is while - (Words which provide structure) - Common Words
* N-Grams - (Group of words occuring together) ex: New York is a BiGram
* Word sense Diambiguation - Get meaning of workbased on context it occurs
* Part of speech Tagging - tag with Noud adverb etc.
* Stemming - Having different ending eg: Close or closer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment