Skip to content

Instantly share code, notes, and snippets.

@aramakus
aramakus / table.csv
Last active September 3, 2020 10:11
f1 score f1 score precision precision recall recall
(Real) (Fake) (Real) (Fake) (Real) (Fake)
ROBERTAClassifier 0.9906 0.9904 0.9844 0.9968 0.9968 0.9840
RobertaForSequenceClassification 0.9821 0.9813 0.9678 0.9967 0.9968 0.9665
BERTClassifier 0.9734 0.9726 0.9658 0.9805 0.9811 0.9649
BertForSequenceClassification 0.9689 0.9675 0.9541 0.9835 0.9842 0.9521
# Model with classifier layers on top of RoBERTa
class ROBERTAClassifier(torch.nn.Module):
def __init__(self, dropout_rate=0.3):
super(ROBERTAClassifier, self).__init__()
self.roberta = RobertaModel.from_pretrained('roberta-base')
self.d1 = torch.nn.Dropout(dropout_rate)
self.l1 = torch.nn.Linear(768, 64)
self.bn1 = torch.nn.LayerNorm(64)
self.d2 = torch.nn.Dropout(dropout_rate)
@aramakus
aramakus / mask.py
Last active September 2, 2020 14:47
for (source, target), _ in train_iter:
mask = (source != PAD_INDEX).type(torch.uint8)
y_pred = model(input_ids=source,
attention_mask=mask)
Classification Report:
precision recall f1-score support
1 0.9844 0.9968 0.9906 634
0 0.9968 0.9840 0.9904 626
accuracy 0.9905 1260
macro avg 0.9906 0.9904 0.9905 1260
weighted avg 0.9906 0.9905 0.9905 1260
# Set tokenizer hyperparameters.
MAX_SEQ_LEN = 256
BATCH_SIZE = 16
PAD_INDEX = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
UNK_INDEX = tokenizer.convert_tokens_to_ids(tokenizer.unk_token)
# Define columns to read.
label_field = Field(sequential=False, use_vocab=False, batch_first=True)
text_field = Field(use_vocab=False,
@aramakus
aramakus / tokenizer.py
Last active September 2, 2020 13:03
tokenizer.ipynb
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")