Skip to content

Instantly share code, notes, and snippets.

entity_type score text
ORG 0.99932663279724 London Metal Exchange
ORG 0.99820256321533 LME
@franespiga
franespiga / table_2.csv
Created February 25, 2021 11:22
table_md_ner_2
Ravenpack_entity entity_type text
Euro Area ORGA euro zone
U.S. Dollar CURR $
U.S. Dollar CURR $
London, England PLCE London
Aluminium CMDT Aluminium
Aluminium CMDT aluminium
@franespiga
franespiga / table_3.csv
Created February 25, 2021 11:24
table_ner_2
Entity type
Ping 'B_COMP'
An 'I_COMP'
for 'O'
instance 'O'
... ...
in 'O'
Healthcare 'B_SECT'
Insurance 'I_SECT'
@franespiga
franespiga / import.py
Created February 25, 2021 11:26
ner_1
from transformers import DistilBertTokenizerFast
from utils.ner_functions import tokenize_and_align_labels
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-cased')
import tensorflow as tf
padding = 'max_length'
label_all_tokens = True
label_to_id # a dictionary to convert from our 27 classes to numeric classes
train_encodings = tokenize_and_align_labels(train_texts, train_tags)
val_encodings = tokenize_and_align_labels(val_texts, val_tags)
from transformers import TFDistilBertForTokenClassification
model = TFDistilBertForTokenClassification.from_pretrained('distilbert-base-cased', num_labels=len(label_list))
freeze_language_model = False
if freeze_language_model:
model.distilbert.trainable = False
model.summary()
@franespiga
franespiga / fit.py
Created February 25, 2021 11:30
ner_4
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss,
metrics = ['accuracy'])
model.fit(train_dataset.shuffle(1000).batch(16),
epochs=3, batch_size=16,
validation_data = val_dataset)
from transformers import pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer, grouped_entities=True)
label_dict = dict(zip(['LABEL_{}'.format(i) for i in id_to_label.keys()], id_to_label.values()))
def detect_entities(text, ner_pipeline, label_dict = {}, remove_labels = True, is_grouped_pipeline = False):
if is_grouped_pipeline:
entity_key = 'entity_group'
else:
@franespiga
franespiga / results_1.csv
Last active February 25, 2021 11:32
ner_results_1
entity_group score text
B_COMP 0.993910849094391 Avon
O 0.999974624978172 said second - quarter profit plunged 70 % as the world’s largest direct - seller of
B_PRDT 0.999925196170807 co
I_PRDT 0.969580918550491 ##smetics
O 0.999910519673274 sold fewer items and continued to lose sales representatives in key markets.
@franespiga
franespiga / ner_results_2.csv
Created February 25, 2021 11:34
ner_results_2.csv
entity_group score text
B_NATL 0.974511623382568 French
B_SECT 0.999704003334045 healthcare
O 0.999978184700012 company
B_COMP 0.999961078166962 San
I_COMP 0.998980890620838 ##ofi - Aventis ( SNY )
O 0.999974077398127 will report its third - quarter earnings on October 31.