Skip to content

Instantly share code, notes, and snippets.

@KuangHao95
Created January 24, 2022 09:02
Show Gist options
  • Save KuangHao95/d538ea00dc4b95de66620f67c56f2092 to your computer and use it in GitHub Desktop.
Save KuangHao95/d538ea00dc4b95de66620f67c56f2092 to your computer and use it in GitHub Desktop.
Demo notebook of using pre-trained Language Models on NUS HPC
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# HPC-AI: Pre-trained Language Models\n",
"\n",
"*Research Computing, NUS IT*\n",
"\n",
"This notebook demonstartes how to use pre-trained language models on NUS HPC-AI clusters. \n",
"\n",
"## 1. Text Vectorization models\n",
"\n",
"Read more about word embeddings [here](https://www.analyticsvidhya.com/blog/2021/06/part-7-step-by-step-guide-to-master-nlp-word-embedding/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.1 Glove\n",
"\n",
"GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.\n",
"\n",
"Available pre-trained Glove versions on HPC:\n",
"\n",
"1. 6B: 6B tokens, 400K vocab, uncased, 50d (dimension)/100d/200d/300d vectors (glove.6B.50d.txt, glove.6B.100d.txt...)\n",
"\n",
"2. 42B: 42B tokens, 1.9M vocab, uncased, 300d vectors (glove.42B.300d.txt)\n",
"\n",
"3. 840B: 840B tokens, 2.2M vocab, cased, 300d vectors (glove.840B.300d.txt) \n",
"\n",
"Directory: `/app1/common/pre-trained/nlp/glove`\n",
"\n",
"Demo of using pre-trained Glove:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# All under the same directory, using 6B.100d as an example\n",
"path_to_glove_file = \"/app1/common/pre-trained/nlp/glove/glove.6B.100d.txt\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.1.1 Tensorflow\n",
"\n",
"There're two ways to use pre-trained Glove in Tensorflow. The popular way these days is to make use of Keras embedding layer, which we will demonstrate. But you can also use `tensorflow.nn.embedding_lookup`, the old-fashioned Tensorflow way. If you are interested in this, please check it out [here](https://www.damienpontifex.com/posts/using-pre-trained-glove-embeddings-in-tensorflow/).\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from tensorflow import keras\n",
"from tensorflow.keras.layers import Embedding"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Vocabulary size: 400000\n"
]
}
],
"source": [
"# Load the embedding matrix in dict\n",
"embeddings_index = {}\n",
"with open(path_to_glove_file) as f:\n",
" for line in f:\n",
" word, coefs = line.split(maxsplit=1)\n",
" coefs = np.fromstring(coefs, \"f\", sep=\" \")\n",
" embeddings_index[word] = coefs\n",
"\n",
"print(\"Vocabulary size: {}\".format(len(embeddings_index)))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dimension: 100\n"
]
}
],
"source": [
"print(\"Dimension: {}\".format(len(embeddings_index['the'])))"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# Create the embedding layer in Keras\n",
"# input_dim: vocabulary size\n",
"# output_dim: output vector size\n",
"embedding_layer = Embedding(\n",
" input_dim,\n",
" output_dim,\n",
" embeddings_initializer=keras.initializers.Constant(embedding_matrix),\n",
" # \"weights=[embedding_matrix],\" is deprecated in Keras\n",
" trainable=False,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.1.2 For Pytorch"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import torch"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# Load the embedding matrix in lists\n",
"# vocab: a list of words, which will be useful in the text to token ids conversion step\n",
"# embeddings: a list of embeddings, this will be used to initialise the embeddings layer\n",
"\n",
"vocab,embeddings = [],[]\n",
"with open(path_to_glove_file,'rt') as fi:\n",
" full_content = fi.read().strip().split('\\n')\n",
"for i in range(len(full_content)):\n",
" i_word = full_content[i].split(' ')[0]\n",
" i_embeddings = [float(val) for val in full_content[i].split(' ')[1:]]\n",
" vocab.append(i_word)\n",
" embeddings.append(i_embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"# Convert to numpy arrays\n",
"vocab_npa = np.array(vocab)\n",
"embs_npa = np.array(embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['<pad>' '<unk>' 'the' ',' '.' 'of' 'to' 'and' 'in' 'a']\n",
"(400002, 100)\n"
]
}
],
"source": [
"# Insert '<pad>' and '<unk>' tokens at start of vocab_npa.\n",
"vocab_npa = np.insert(vocab_npa, 0, '<pad>')\n",
"vocab_npa = np.insert(vocab_npa, 1, '<unk>')\n",
"print(vocab_npa[:10])\n",
"\n",
"pad_emb_npa = np.zeros((1,embs_npa.shape[1])) # embedding for '<pad>' token.\n",
"unk_emb_npa = np.mean(embs_npa,axis=0,keepdims=True) # embedding for '<unk>' token.\n",
"\n",
"# Insert embeddings for pad and unk tokens at top of embs_npa.\n",
"embs_npa = np.vstack((pad_emb_npa,unk_emb_npa,embs_npa))\n",
"print(embs_npa.shape)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([400002, 100])\n"
]
}
],
"source": [
"# Create the embedding layer in Pytorch\n",
"my_embedding_layer = torch.nn.Embedding.from_pretrained(torch.from_numpy(embs_npa).float())\n",
"\n",
"assert my_embedding_layer.weight.shape == embs_npa.shape\n",
"print(my_embedding_layer.weight.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.2. Word2Vec\n",
"\n",
"Word2Vec is a family of model architectures and optimizations that can be used to learn word embeddings from large datasets. Embeddings learned through Word2Vec have proven to be successful on a variety of downstream natural language processing tasks.\n",
"\n",
"Available pre-trained Word2Vec versions on HPC:\n",
"\n",
"1. Google News: 100B tokens, 3M vocab, uncased, 300d vectors (GoogleNews-vectors-negative300.bin)\n",
"\n",
"Directory: `/app1/common/pre-trained/nlp/word2vec`\n",
"\n",
"Demo of using Word2Vec with [Gensim](https://www.shanelynn.ie/word-embeddings-in-python-with-spacy-and-gensim/) (works for both TensorFlow and PyTorch):"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from gensim.models import KeyedVectors"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"path_to_word2vec_file = \"/app1/common/pre-trained/nlp/word2vec/GoogleNews-vectors-negative300.bin\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Load vectors directly from the file\n",
"model = KeyedVectors.load_word2vec_format(path_to_word2vec_file, binary=True)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Access vectors for specific words with a keyed lookup:\n",
"vector = model['easy']"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dimension is (300,)\n"
]
}
],
"source": [
"print(\"Dimension is {}\".format(vector.shape))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"[('great', 0.7291510105133057),\n",
" ('bad', 0.7190051078796387),\n",
" ('terrific', 0.6889115571975708),\n",
" ('decent', 0.6837348341941833),\n",
" ('nice', 0.6836092472076416),\n",
" ('excellent', 0.644292950630188),\n",
" ('fantastic', 0.6407778263092041),\n",
" ('better', 0.6120728850364685),\n",
" ('solid', 0.5806034803390503),\n",
" ('lousy', 0.576420247554779)]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.most_similar(\"good\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Transformer model\n",
"\n",
"### 2.1. BERT\n",
"\n",
"**BERT** (Bidirectional Encoder Representations from Transformers) is a neural network architecture designed by Google researchers that has totally transformed what’s state-of-the-art for NLP tasks, like text classification, translation, summarization, and question answering.\n",
"\n",
"Location on HPC: `/app1/common/pre-trained/nlp/bert-base-uncased`\n",
"\n",
"We will use [Transformers](https://huggingface.co/docs/transformers/index) to load pre-trained BERT. For model pre-trained language models, please check it out [here](https://huggingface.co/models)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"bert_path = \"/app1/common/pre-trained/nlp/bert-base-uncased\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.1.1 Tensorflow"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import tensorflow as tf\n",
"\n",
"from transformers import BertTokenizer, TFBertForSequenceClassification\n",
"from transformers import InputExample, InputFeatures\n",
"\n",
"# ignore tensorflow warnings\n",
"tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2022-01-12 17:17:33.302551: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA\n",
"To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
"2022-01-12 17:17:47.318838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30996 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1c:00.0, compute capability: 7.0\n",
"All model checkpoint layers were used when initializing TFBertForSequenceClassification.\n",
"\n",
"Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at /app1/common/pre-trained/nlp/bert-base-uncased and are newly initialized: ['classifier']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
}
],
"source": [
"# Load model from local file. Take text classification as an example\n",
"model = TFBertForSequenceClassification.from_pretrained(bert_path)\n",
"tokenizer = BertTokenizer.from_pretrained(bert_path)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model: \"tf_bert_for_sequence_classification\"\n",
"_________________________________________________________________\n",
"Layer (type) Output Shape Param # \n",
"=================================================================\n",
"bert (TFBertMainLayer) multiple 109482240 \n",
"_________________________________________________________________\n",
"dropout_37 (Dropout) multiple 0 \n",
"_________________________________________________________________\n",
"classifier (Dense) multiple 1538 \n",
"=================================================================\n",
"Total params: 109,483,778\n",
"Trainable params: 109,483,778\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n"
]
}
],
"source": [
"model.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.1.2 Pytorch"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from torch import nn\n",
"from transformers import BertModel\n",
"\n",
"from transformers import BertTokenizer, TFBertForSequenceClassification\n",
"from transformers import InputExample, InputFeatures"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"class BertClassifier(nn.Module):\n",
"\n",
" def __init__(self, dropout=0.5):\n",
"\n",
" super(BertClassifier, self).__init__()\n",
"\n",
" self.bert = BertModel.from_pretrained(bert_path)\n",
" self.dropout = nn.Dropout(dropout)\n",
" self.linear = nn.Linear(768, 5)\n",
" self.relu = nn.ReLU()\n",
"\n",
" def forward(self, input_id, mask):\n",
"\n",
" _, pooled_output = self.bert(input_ids= input_id, attention_mask=mask,return_dict=False)\n",
" dropout_output = self.dropout(pooled_output)\n",
" linear_output = self.linear(dropout_output)\n",
" final_layer = self.relu(linear_output)\n",
"\n",
" return final_layer"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Some weights of the model checkpoint at /hpctmp/haokuang/NLP_models/bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight']\n",
"- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n"
]
}
],
"source": [
"model = BertClassifier()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"BertClassifier(\n",
" (bert): BertModel(\n",
" (embeddings): BertEmbeddings(\n",
" (word_embeddings): Embedding(30522, 768, padding_idx=0)\n",
" (position_embeddings): Embedding(512, 768)\n",
" (token_type_embeddings): Embedding(2, 768)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (encoder): BertEncoder(\n",
" (layer): ModuleList(\n",
" (0): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (1): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (2): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (3): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (4): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (5): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (6): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (7): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (8): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (9): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (10): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (11): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (pooler): BertPooler(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (activation): Tanh()\n",
" )\n",
" )\n",
" (dropout): Dropout(p=0.5, inplace=False)\n",
" (linear): Linear(in_features=768, out_features=5, bias=True)\n",
" (relu): ReLU()\n",
")\n"
]
}
],
"source": [
"print(model)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment