Skip to content

Instantly share code, notes, and snippets.

View vkocaman's full-sized avatar
🏠
Working from home

Veysel Kocaman vkocaman

🏠
Working from home
View GitHub Profile
@vkocaman
vkocaman / pretrained pipelines.csv
Created September 27, 2019 21:29
list of pretrained pipelines
Pipelines Name Language
Explain Document ML explain_document_ml English
Explain Document DL explain_document_dl English
Explain Document DL Win explain_document_dl_noncontrib English
Explain Document DL Fast explain_document_dl_fast English
Explain Document DL Fast Win explain_document_dl_fast_noncontrib English
Recognize Entities DL recognize_entities_dl English
Recognize Entities DL Win recognize_entities_dl_noncontrib English
OntoNotes Entities Small onto_recognize_entities_sm English
OntoNotes Entities Large onto_recognize_entities_lg English
@vkocaman
vkocaman / pretrained.csv
Created September 27, 2019 21:19
Pretrained Models
Model Name Version Language
LemmatizerModel (Lemmatizer) lemma_antbnc Open Source English
PerceptronModel (POS) pos_anc Open Source English
NerCRFModel (NER with GloVe) ner_crf Open Source English
NerDLModel (NER with GloVe) ner_dl Open Source English
NerDLModel (NER with GloVe) ner_dl_contrib Open Source English
NerDLModel (NER with BERT) ner_dl_bert_base_cased Open Source English
NerDLModel (OntoNotes with GloVe 100d) onto_100 Open Source English
NerDLModel (OntoNotes with GloVe 300d) onto_300 Open Source English
WordEmbeddings (GloVe) glove_100d Open Source English
@vkocaman
vkocaman / annotators.csv
Created September 27, 2019 21:05
list of annotators offered by Spark NLP
We can make this file beautiful and searchable if this error is corrected: It looks like row 10 should actually have 5 columns, instead of 3. in line 9.
Annotator,Description,Version,Annotator Approach,Annotator Model
Tokenizer*,Identifies tokens with tokenization open standards,Opensource,-,+
Normalizer*,Removes all dirty characters from text,Opensource,-,+
Stemmer*,Returns hard'-stems out of words with the objective of retrieving the meaningful part of the word,Opensource,+,-
Lemmatizer*,Retrieves lemmas out of words with the objective of returning a base dictionary word,Opensource,-,+
RegexMatcher*,Uses a reference file to match a set of regular expressions and put them inside a provided key.,Opensource,+,+
TextMatcher*,Annotator to match entire phrases (by token) provided in a file against a Document,Opensource,+,+
Chunker*,Matches a pattern of part'-of'-speech tags in order to return meaningful phrases from document,Opensource,+,-
DateMatcher*,Reads from different forms of date and time expressions and converts them to a provided date format,Opensource,+,-
SentenceDetector*,Finds sentence bounds in raw text. Applies rules from Pragmatic Segmenter,Opensou