This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| f1 score | f1 score | precision | precision | recall | recall | ||
|---|---|---|---|---|---|---|---|
| (Real) | (Fake) | (Real) | (Fake) | (Real) | (Fake) | ||
| ROBERTAClassifier | 0.9906 | 0.9904 | 0.9844 | 0.9968 | 0.9968 | 0.9840 | |
| RobertaForSequenceClassification | 0.9821 | 0.9813 | 0.9678 | 0.9967 | 0.9968 | 0.9665 | |
| BERTClassifier | 0.9734 | 0.9726 | 0.9658 | 0.9805 | 0.9811 | 0.9649 | |
| BertForSequenceClassification | 0.9689 | 0.9675 | 0.9541 | 0.9835 | 0.9842 | 0.9521 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Model with classifier layers on top of RoBERTa | |
| class ROBERTAClassifier(torch.nn.Module): | |
| def __init__(self, dropout_rate=0.3): | |
| super(ROBERTAClassifier, self).__init__() | |
| self.roberta = RobertaModel.from_pretrained('roberta-base') | |
| self.d1 = torch.nn.Dropout(dropout_rate) | |
| self.l1 = torch.nn.Linear(768, 64) | |
| self.bn1 = torch.nn.LayerNorm(64) | |
| self.d2 = torch.nn.Dropout(dropout_rate) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| for (source, target), _ in train_iter: | |
| mask = (source != PAD_INDEX).type(torch.uint8) | |
| y_pred = model(input_ids=source, | |
| attention_mask=mask) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Classification Report: | |
| precision recall f1-score support | |
| 1 0.9844 0.9968 0.9906 634 | |
| 0 0.9968 0.9840 0.9904 626 | |
| accuracy 0.9905 1260 | |
| macro avg 0.9906 0.9904 0.9905 1260 | |
| weighted avg 0.9906 0.9905 0.9905 1260 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Set tokenizer hyperparameters. | |
| MAX_SEQ_LEN = 256 | |
| BATCH_SIZE = 16 | |
| PAD_INDEX = tokenizer.convert_tokens_to_ids(tokenizer.pad_token) | |
| UNK_INDEX = tokenizer.convert_tokens_to_ids(tokenizer.unk_token) | |
| # Define columns to read. | |
| label_field = Field(sequential=False, use_vocab=False, batch_first=True) | |
| text_field = Field(use_vocab=False, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| tokenizer = RobertaTokenizer.from_pretrained("roberta-base") |