This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| entity_type | score | text | |
|---|---|---|---|
| ORG | 0.99932663279724 | London Metal Exchange | |
| ORG | 0.99820256321533 | LME |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Ravenpack_entity | entity_type | text | |
|---|---|---|---|
| Euro Area | ORGA | euro zone | |
| U.S. Dollar | CURR | $ | |
| U.S. Dollar | CURR | $ | |
| London, England | PLCE | London | |
| Aluminium | CMDT | Aluminium | |
| Aluminium | CMDT | aluminium |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Entity | type | |
|---|---|---|
| Ping | 'B_COMP' | |
| An | 'I_COMP' | |
| for | 'O' | |
| instance | 'O' | |
| ... | ... | |
| in | 'O' | |
| Healthcare | 'B_SECT' | |
| Insurance | 'I_SECT' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from transformers import DistilBertTokenizerFast | |
| from utils.ner_functions import tokenize_and_align_labels | |
| tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-cased') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import tensorflow as tf | |
| padding = 'max_length' | |
| label_all_tokens = True | |
| label_to_id # a dictionary to convert from our 27 classes to numeric classes | |
| train_encodings = tokenize_and_align_labels(train_texts, train_tags) | |
| val_encodings = tokenize_and_align_labels(val_texts, val_tags) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from transformers import TFDistilBertForTokenClassification | |
| model = TFDistilBertForTokenClassification.from_pretrained('distilbert-base-cased', num_labels=len(label_list)) | |
| freeze_language_model = False | |
| if freeze_language_model: | |
| model.distilbert.trainable = False | |
| model.summary() | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5) | |
| model.compile(optimizer=optimizer, loss=model.compute_loss, | |
| metrics = ['accuracy']) | |
| model.fit(train_dataset.shuffle(1000).batch(16), | |
| epochs=3, batch_size=16, | |
| validation_data = val_dataset) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from transformers import pipeline | |
| nlp = pipeline("ner", model=model, tokenizer=tokenizer, grouped_entities=True) | |
| label_dict = dict(zip(['LABEL_{}'.format(i) for i in id_to_label.keys()], id_to_label.values())) | |
| def detect_entities(text, ner_pipeline, label_dict = {}, remove_labels = True, is_grouped_pipeline = False): | |
| if is_grouped_pipeline: | |
| entity_key = 'entity_group' | |
| else: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| entity_group | score | text | |
|---|---|---|---|
| B_COMP | 0.993910849094391 | Avon | |
| O | 0.999974624978172 | said second - quarter profit plunged 70 % as the world’s largest direct - seller of | |
| B_PRDT | 0.999925196170807 | co | |
| I_PRDT | 0.969580918550491 | ##smetics | |
| O | 0.999910519673274 | sold fewer items and continued to lose sales representatives in key markets. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| entity_group | score | text | |
|---|---|---|---|
| B_NATL | 0.974511623382568 | French | |
| B_SECT | 0.999704003334045 | healthcare | |
| O | 0.999978184700012 | company | |
| B_COMP | 0.999961078166962 | San | |
| I_COMP | 0.998980890620838 | ##ofi - Aventis ( SNY ) | |
| O | 0.999974077398127 | will report its third - quarter earnings on October 31. |
OlderNewer