Created
January 24, 2022 09:02
-
-
Save KuangHao95/d538ea00dc4b95de66620f67c56f2092 to your computer and use it in GitHub Desktop.
Demo notebook of using pre-trained Language Models on NUS HPC
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# HPC-AI: Pre-trained Language Models\n", | |
"\n", | |
"*Research Computing, NUS IT*\n", | |
"\n", | |
"This notebook demonstartes how to use pre-trained language models on NUS HPC-AI clusters. \n", | |
"\n", | |
"## 1. Text Vectorization models\n", | |
"\n", | |
"Read more about word embeddings [here](https://www.analyticsvidhya.com/blog/2021/06/part-7-step-by-step-guide-to-master-nlp-word-embedding/)." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 1.1 Glove\n", | |
"\n", | |
"GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.\n", | |
"\n", | |
"Available pre-trained Glove versions on HPC:\n", | |
"\n", | |
"1. 6B: 6B tokens, 400K vocab, uncased, 50d (dimension)/100d/200d/300d vectors (glove.6B.50d.txt, glove.6B.100d.txt...)\n", | |
"\n", | |
"2. 42B: 42B tokens, 1.9M vocab, uncased, 300d vectors (glove.42B.300d.txt)\n", | |
"\n", | |
"3. 840B: 840B tokens, 2.2M vocab, cased, 300d vectors (glove.840B.300d.txt) \n", | |
"\n", | |
"Directory: `/app1/common/pre-trained/nlp/glove`\n", | |
"\n", | |
"Demo of using pre-trained Glove:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# All under the same directory, using 6B.100d as an example\n", | |
"path_to_glove_file = \"/app1/common/pre-trained/nlp/glove/glove.6B.100d.txt\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### 1.1.1 Tensorflow\n", | |
"\n", | |
"There're two ways to use pre-trained Glove in Tensorflow. The popular way these days is to make use of Keras embedding layer, which we will demonstrate. But you can also use `tensorflow.nn.embedding_lookup`, the old-fashioned Tensorflow way. If you are interested in this, please check it out [here](https://www.damienpontifex.com/posts/using-pre-trained-glove-embeddings-in-tensorflow/).\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"from tensorflow import keras\n", | |
"from tensorflow.keras.layers import Embedding" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Vocabulary size: 400000\n" | |
] | |
} | |
], | |
"source": [ | |
"# Load the embedding matrix in dict\n", | |
"embeddings_index = {}\n", | |
"with open(path_to_glove_file) as f:\n", | |
" for line in f:\n", | |
" word, coefs = line.split(maxsplit=1)\n", | |
" coefs = np.fromstring(coefs, \"f\", sep=\" \")\n", | |
" embeddings_index[word] = coefs\n", | |
"\n", | |
"print(\"Vocabulary size: {}\".format(len(embeddings_index)))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Dimension: 100\n" | |
] | |
} | |
], | |
"source": [ | |
"print(\"Dimension: {}\".format(len(embeddings_index['the'])))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Create the embedding layer in Keras\n", | |
"# input_dim: vocabulary size\n", | |
"# output_dim: output vector size\n", | |
"embedding_layer = Embedding(\n", | |
" input_dim,\n", | |
" output_dim,\n", | |
" embeddings_initializer=keras.initializers.Constant(embedding_matrix),\n", | |
" # \"weights=[embedding_matrix],\" is deprecated in Keras\n", | |
" trainable=False,\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### 1.1.2 For Pytorch" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"import torch" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Load the embedding matrix in lists\n", | |
"# vocab: a list of words, which will be useful in the text to token ids conversion step\n", | |
"# embeddings: a list of embeddings, this will be used to initialise the embeddings layer\n", | |
"\n", | |
"vocab,embeddings = [],[]\n", | |
"with open(path_to_glove_file,'rt') as fi:\n", | |
" full_content = fi.read().strip().split('\\n')\n", | |
"for i in range(len(full_content)):\n", | |
" i_word = full_content[i].split(' ')[0]\n", | |
" i_embeddings = [float(val) for val in full_content[i].split(' ')[1:]]\n", | |
" vocab.append(i_word)\n", | |
" embeddings.append(i_embeddings)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Convert to numpy arrays\n", | |
"vocab_npa = np.array(vocab)\n", | |
"embs_npa = np.array(embeddings)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['<pad>' '<unk>' 'the' ',' '.' 'of' 'to' 'and' 'in' 'a']\n", | |
"(400002, 100)\n" | |
] | |
} | |
], | |
"source": [ | |
"# Insert '<pad>' and '<unk>' tokens at start of vocab_npa.\n", | |
"vocab_npa = np.insert(vocab_npa, 0, '<pad>')\n", | |
"vocab_npa = np.insert(vocab_npa, 1, '<unk>')\n", | |
"print(vocab_npa[:10])\n", | |
"\n", | |
"pad_emb_npa = np.zeros((1,embs_npa.shape[1])) # embedding for '<pad>' token.\n", | |
"unk_emb_npa = np.mean(embs_npa,axis=0,keepdims=True) # embedding for '<unk>' token.\n", | |
"\n", | |
"# Insert embeddings for pad and unk tokens at top of embs_npa.\n", | |
"embs_npa = np.vstack((pad_emb_npa,unk_emb_npa,embs_npa))\n", | |
"print(embs_npa.shape)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"torch.Size([400002, 100])\n" | |
] | |
} | |
], | |
"source": [ | |
"# Create the embedding layer in Pytorch\n", | |
"my_embedding_layer = torch.nn.Embedding.from_pretrained(torch.from_numpy(embs_npa).float())\n", | |
"\n", | |
"assert my_embedding_layer.weight.shape == embs_npa.shape\n", | |
"print(my_embedding_layer.weight.shape)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 1.2. Word2Vec\n", | |
"\n", | |
"Word2Vec is a family of model architectures and optimizations that can be used to learn word embeddings from large datasets. Embeddings learned through Word2Vec have proven to be successful on a variety of downstream natural language processing tasks.\n", | |
"\n", | |
"Available pre-trained Word2Vec versions on HPC:\n", | |
"\n", | |
"1. Google News: 100B tokens, 3M vocab, uncased, 300d vectors (GoogleNews-vectors-negative300.bin)\n", | |
"\n", | |
"Directory: `/app1/common/pre-trained/nlp/word2vec`\n", | |
"\n", | |
"Demo of using Word2Vec with [Gensim](https://www.shanelynn.ie/word-embeddings-in-python-with-spacy-and-gensim/) (works for both TensorFlow and PyTorch):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from gensim.models import KeyedVectors" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"path_to_word2vec_file = \"/app1/common/pre-trained/nlp/word2vec/GoogleNews-vectors-negative300.bin\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Load vectors directly from the file\n", | |
"model = KeyedVectors.load_word2vec_format(path_to_word2vec_file, binary=True)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Access vectors for specific words with a keyed lookup:\n", | |
"vector = model['easy']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Dimension is (300,)\n" | |
] | |
} | |
], | |
"source": [ | |
"print(\"Dimension is {}\".format(vector.shape))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('great', 0.7291510105133057),\n", | |
" ('bad', 0.7190051078796387),\n", | |
" ('terrific', 0.6889115571975708),\n", | |
" ('decent', 0.6837348341941833),\n", | |
" ('nice', 0.6836092472076416),\n", | |
" ('excellent', 0.644292950630188),\n", | |
" ('fantastic', 0.6407778263092041),\n", | |
" ('better', 0.6120728850364685),\n", | |
" ('solid', 0.5806034803390503),\n", | |
" ('lousy', 0.576420247554779)]" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"model.most_similar(\"good\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 2. Transformer model\n", | |
"\n", | |
"### 2.1. BERT\n", | |
"\n", | |
"**BERT** (Bidirectional Encoder Representations from Transformers) is a neural network architecture designed by Google researchers that has totally transformed what’s state-of-the-art for NLP tasks, like text classification, translation, summarization, and question answering.\n", | |
"\n", | |
"Location on HPC: `/app1/common/pre-trained/nlp/bert-base-uncased`\n", | |
"\n", | |
"We will use [Transformers](https://huggingface.co/docs/transformers/index) to load pre-trained BERT. For model pre-trained language models, please check it out [here](https://huggingface.co/models)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"bert_path = \"/app1/common/pre-trained/nlp/bert-base-uncased\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### 2.1.1 Tensorflow" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import os\n", | |
"import tensorflow as tf\n", | |
"\n", | |
"from transformers import BertTokenizer, TFBertForSequenceClassification\n", | |
"from transformers import InputExample, InputFeatures\n", | |
"\n", | |
"# ignore tensorflow warnings\n", | |
"tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"2022-01-12 17:17:33.302551: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA\n", | |
"To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", | |
"2022-01-12 17:17:47.318838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30996 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1c:00.0, compute capability: 7.0\n", | |
"All model checkpoint layers were used when initializing TFBertForSequenceClassification.\n", | |
"\n", | |
"Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at /app1/common/pre-trained/nlp/bert-base-uncased and are newly initialized: ['classifier']\n", | |
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" | |
] | |
} | |
], | |
"source": [ | |
"# Load model from local file. Take text classification as an example\n", | |
"model = TFBertForSequenceClassification.from_pretrained(bert_path)\n", | |
"tokenizer = BertTokenizer.from_pretrained(bert_path)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Model: \"tf_bert_for_sequence_classification\"\n", | |
"_________________________________________________________________\n", | |
"Layer (type) Output Shape Param # \n", | |
"=================================================================\n", | |
"bert (TFBertMainLayer) multiple 109482240 \n", | |
"_________________________________________________________________\n", | |
"dropout_37 (Dropout) multiple 0 \n", | |
"_________________________________________________________________\n", | |
"classifier (Dense) multiple 1538 \n", | |
"=================================================================\n", | |
"Total params: 109,483,778\n", | |
"Trainable params: 109,483,778\n", | |
"Non-trainable params: 0\n", | |
"_________________________________________________________________\n" | |
] | |
} | |
], | |
"source": [ | |
"model.summary()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### 2.1.2 Pytorch" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from torch import nn\n", | |
"from transformers import BertModel\n", | |
"\n", | |
"from transformers import BertTokenizer, TFBertForSequenceClassification\n", | |
"from transformers import InputExample, InputFeatures" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"class BertClassifier(nn.Module):\n", | |
"\n", | |
" def __init__(self, dropout=0.5):\n", | |
"\n", | |
" super(BertClassifier, self).__init__()\n", | |
"\n", | |
" self.bert = BertModel.from_pretrained(bert_path)\n", | |
" self.dropout = nn.Dropout(dropout)\n", | |
" self.linear = nn.Linear(768, 5)\n", | |
" self.relu = nn.ReLU()\n", | |
"\n", | |
" def forward(self, input_id, mask):\n", | |
"\n", | |
" _, pooled_output = self.bert(input_ids= input_id, attention_mask=mask,return_dict=False)\n", | |
" dropout_output = self.dropout(pooled_output)\n", | |
" linear_output = self.linear(dropout_output)\n", | |
" final_layer = self.relu(linear_output)\n", | |
"\n", | |
" return final_layer" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"Some weights of the model checkpoint at /hpctmp/haokuang/NLP_models/bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight']\n", | |
"- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", | |
"- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n" | |
] | |
} | |
], | |
"source": [ | |
"model = BertClassifier()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"BertClassifier(\n", | |
" (bert): BertModel(\n", | |
" (embeddings): BertEmbeddings(\n", | |
" (word_embeddings): Embedding(30522, 768, padding_idx=0)\n", | |
" (position_embeddings): Embedding(512, 768)\n", | |
" (token_type_embeddings): Embedding(2, 768)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (encoder): BertEncoder(\n", | |
" (layer): ModuleList(\n", | |
" (0): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (1): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (2): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (3): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (4): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (5): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (6): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (7): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (8): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (9): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (10): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (11): BertLayer(\n", | |
" (attention): BertAttention(\n", | |
" (self): BertSelfAttention(\n", | |
" (query): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (key): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (value): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" (output): BertSelfOutput(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" (intermediate): BertIntermediate(\n", | |
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n", | |
" )\n", | |
" (output): BertOutput(\n", | |
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n", | |
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", | |
" (dropout): Dropout(p=0.1, inplace=False)\n", | |
" )\n", | |
" )\n", | |
" )\n", | |
" )\n", | |
" (pooler): BertPooler(\n", | |
" (dense): Linear(in_features=768, out_features=768, bias=True)\n", | |
" (activation): Tanh()\n", | |
" )\n", | |
" )\n", | |
" (dropout): Dropout(p=0.5, inplace=False)\n", | |
" (linear): Linear(in_features=768, out_features=5, bias=True)\n", | |
" (relu): ReLU()\n", | |
")\n" | |
] | |
} | |
], | |
"source": [ | |
"print(model)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment