Skip to content

Instantly share code, notes, and snippets.

@vkocaman
Created September 27, 2019 21:05
Show Gist options
  • Save vkocaman/e091605f012ffc1efc0fcda170919602 to your computer and use it in GitHub Desktop.
Save vkocaman/e091605f012ffc1efc0fcda170919602 to your computer and use it in GitHub Desktop.
list of annotators offered by Spark NLP
Annotator Description Version Annotator Approach Annotator Model
Tokenizer* Identifies tokens with tokenization open standards Opensource - +
Normalizer* Removes all dirty characters from text Opensource - +
Stemmer* Returns hard'-stems out of words with the objective of retrieving the meaningful part of the word Opensource + -
Lemmatizer* Retrieves lemmas out of words with the objective of returning a base dictionary word Opensource - +
RegexMatcher* Uses a reference file to match a set of regular expressions and put them inside a provided key. Opensource + +
TextMatcher* Annotator to match entire phrases (by token) provided in a file against a Document Opensource + +
Chunker* Matches a pattern of part'-of'-speech tags in order to return meaningful phrases from document Opensource + -
DateMatcher* Reads from different forms of date and time expressions and converts them to a provided date format Opensource + -
SentenceDetector* Finds sentence bounds in raw text. Applies rules from Pragmatic Segmenter Opensource + -
DeepSentenceDetector* Finds sentence bounds in raw text. Applies a Named Entity Recognition DL model Opensource + -
POSTagger Sets a Part'-Of'-Speech tag to each word within a sentence. Opensource + +
ViveknSentimentDetector Scores a sentence for a sentiment Opensource + +
SentimentDetector* Scores a sentence for a sentiment Opensource + +
WordEmbeddings* Word Embeddings lookup annotator that maps tokens to vectors Opensource + +
BertEmbeddings* Bert Embeddings that maps tokens to vectors in a bidirectional way Opensource + -
NerCrf Named Entity recognition annotator allows for a generic model to be trained by utilizing a CRF machine learning algorithm Opensource + +
NerDL This Named Entity recognition annotator allows to train generic NER model based on Neural Networks by utilizing Char CNNs '- BiLSTM '- CRF architecture that achieves state'-of'-the'-art in most datasets. Opensource + +
NorvigSweeting This annotator retrieves tokens and makes corrections automatically if not found in an English dictionary Opensource + +
SymmetricDelete This spell checker is inspired on Symmetric Delete algorithm Opensource + +
ContextSpellChecker Utilizes tensorflow to do context based spell checking Opensource + +
DependencyParser Unlabeled parser that finds a grammatical relation between two words in a sentence Opensource + +
TypedDependencyParser Labeled parser that finds a grammatical relation between two words in a sentence Opensource + +
AssertionLogReg It will classify each clinically relevant named entity into its assertion type: “present”, “absent”, “hypothetical”, etc. Licensed + +
AssertionDL It will classify each clinically relevant named entity into its assertion type: “present”, “absent”, “hypothetical”, etc. Licensed + +
EntityResolver Assigns a ICD10 (International Classification of Diseases version 10) code to chunks identified as “PROBLEMS” by the NER Clinical Model Licensed + +
DeIdentification Identifies potential pieces of content with personal information about patients and remove them by replacing with semantic tags. Licensed + +
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment