Skip to content

Instantly share code, notes, and snippets.

@marcusmw
Created January 13, 2021 00:21
Show Gist options
  • Save marcusmw/3e9775f1a93216b0be05b6f179c67cc9 to your computer and use it in GitHub Desktop.
Save marcusmw/3e9775f1a93216b0be05b6f179c67cc9 to your computer and use it in GitHub Desktop.
TA_reviews.model_def.python.3.ipynb - Advanced Data Science Capstone
Display the source blob
Display the rendered blob
Raw
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"TA_reviews.model_definition","provenance":[],"collapsed_sections":[],"mount_file_id":"1yLDPwEo_yxRrlFfaPjWx8Lojz0muvsT6","authorship_tag":"ABX9TyM378HQtEDYckzyF2cTFd0Q"},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"HafTEzqiERsw"},"source":["In this notebook we will define our model. From the previous stage we have preprocessed the data such that all of the string processing has been performed.We will compare a machine learning and deep learning model, iterate over different class imbalance solutions to select one and then refine whichever prevails with a grid search.\r\n","\r\n","Cannot implement a pipeline due to the lack of a fit_transform method in train test split. So the individual steps to prepare the data are:\r\n","\r\n","1. import data and assign to variables(ie labels and data)\r\n","2. Build corpus\r\n","3. One Hot Encode \r\n","4. pad sequences\r\n","5. train / test split\r\n","6. Apply additional iteration process through different class imbalance methods ie to determine max performance; None, Random OverSampling, ADASYN and SMOTE \r\n","7. Fit to model "]},{"cell_type":"code","metadata":{"id":"s5GiCbUfMYLt"},"source":["import numpy as np\r\n","import pandas as pd\r\n","df = pd.read_csv('/content/drive/MyDrive/Colab_Notebooks/TripAdvisor/TA_reviews_FeatEng.csv')\r\n","\r\n","# https://github.com/IBM/skillsnetwork/tree/master/coursera_capstone/guidelines\r\n","\r\n"," "],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"KG1eBiuGaWX0"},"source":["labels = df['Rating']\r\n","data =df['clean_lemma']"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"6GaN9NgGmTS9"},"source":["Distribute labels and data."]},{"cell_type":"code","metadata":{"id":"b0c0YuUIF2EU"},"source":["corpus = []\r\n","for i in range(0,len(data)):\r\n"," text = data[i]\r\n"," corpus.append(text)\r\n","# print(corpus[0:5]) "],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"EQvKU_E1mY8X"},"source":["Build a corpus of all reviews."]},{"cell_type":"code","metadata":{"id":"NQLlbIcMgcXh"},"source":["from tensorflow.keras.preprocessing.text import one_hot\r\n","onehot_repr=[one_hot(words,71537)for words in corpus]\r\n","print(onehot_repr[0:2])"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"xu8w5hsjmn-s"},"source":["Numerically encode all words using Keras' one_hot. The lack of a fit method here will cause some problems for model deployment."]},{"cell_type":"code","metadata":{"id":"DkC9O-WyQkhI"},"source":["sent_length = 190\r\n","from tensorflow.keras.preprocessing.sequence import pad_sequences\r\n","embedded_docs=pad_sequences(onehot_repr,padding='pre',maxlen=sent_length)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"kG-HfzzEmzjr"},"source":["Pad all encoded reviews, such that they are all the same length (maxlen). Sent_length was chosen to be the same as one standard deviation + the mean of processed review length. Prior to excluding not, no and but from stopwords this was 190, now this is 195, however we will stick with 190."]},{"cell_type":"code","metadata":{"id":"6pBRrHQJRJHU"},"source":["from sklearn.model_selection import train_test_split\r\n","X_train, X_test, y_train, y_test = train_test_split(embedded_docs, labels, test_size=0.15, random_state=42)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"ZDeL9DweQpVF","executionInfo":{"status":"ok","timestamp":1609715942778,"user_tz":-660,"elapsed":12401,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"}},"outputId":"3f888f15-8c61-434f-e57d-ec1abb9ca06e"},"source":["from imblearn.over_sampling import ADASYN, SMOTE, RandomOverSampler\r\n","smt = SMOTE(random_state=1, k_neighbors=1)\r\n","X_SMOTE, y_SMOTE = smt.fit_sample(X_train, y_train)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/sklearn/externals/six.py:31: FutureWarning: The module is deprecated in version 0.21 and will be removed in version 0.23 since we've dropped support for Python 2.7. Please rely on the official version of six (https://pypi.org/project/six/).\n"," \"(https://pypi.org/project/six/).\", FutureWarning)\n","/usr/local/lib/python3.6/dist-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.neighbors.base module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.neighbors. Anything that cannot be imported from sklearn.neighbors is now part of the private API.\n"," warnings.warn(message, FutureWarning)\n","/usr/local/lib/python3.6/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function safe_indexing is deprecated; safe_indexing is deprecated in version 0.22 and will be removed in version 0.24.\n"," warnings.warn(msg, category=FutureWarning)\n"],"name":"stderr"}]},{"cell_type":"markdown","metadata":{"id":"Q8ld3nfJnd5g"},"source":["Split the padded encodings and labels into train and test sets. Then feed the training set into SMOTE to synthesize new data and rectify class imbalance. "]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OKtXrymoi-SD","executionInfo":{"status":"ok","timestamp":1609715956501,"user_tz":-660,"elapsed":1106,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"}},"outputId":"b0104969-f43e-436f-d7d6-3e91a34d12fe"},"source":["X_SMOTE.shape"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["(25618, 190)"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"code","metadata":{"id":"AmoRkMUlQqGU"},"source":["import tensorflow as tf\r\n","from tensorflow.keras.models import Sequential\r\n","from sklearn.model_selection import GridSearchCV\r\n","from keras.layers import Embedding,Flatten,BatchNormalization,Activation,LSTM,Dense,Bidirectional,Dropout\r\n","from keras.activations import relu,sigmoid\r\n","from sklearn.metrics import classification_report"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"Slf7Gb0YTtSA"},"source":["embedding_vector_features = 40"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"U_U0C-HwZtMZ"},"source":["initial_model = Sequential()\r\n","initial_model.add(Embedding(71537,embedding_vector_features,input_length=sent_length))\r\n","initial_model.add(Bidirectional(LSTM(100)))\r\n","initial_model.add(Dropout(0.3))\r\n","initial_model.add(Dense(1,activation='sigmoid'))\r\n","initial_model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uKNKDiu7n08b"},"source":["This deep learning model configuration was based off of discoure, both academic and online discussing the prowess of a bidirectional lstm with text classification. \r\n","\r\n","https://towardsdatascience.com/sentence-classification-using-bi-lstm-b74151ffa565\r\n","\r\n","https://www.aclweb.org/anthology/C16-1329/"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"EmDKAdD5Zz55","executionInfo":{"status":"ok","timestamp":1609667795539,"user_tz":-660,"elapsed":1007507,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"}},"outputId":"c6e48de0-57eb-460c-be37-c0054ff98690"},"source":["initial_Hist = initial_model.fit(X_SMOTE,y_SMOTE,epochs=5,validation_split=.05,batch_size=32)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Epoch 1/5\n","761/761 [==============================] - 203s 263ms/step - loss: 0.4004 - accuracy: 0.7999 - val_loss: 0.0152 - val_accuracy: 1.0000\n","Epoch 2/5\n","761/761 [==============================] - 200s 263ms/step - loss: 0.1548 - accuracy: 0.9447 - val_loss: 0.0056 - val_accuracy: 0.9992\n","Epoch 3/5\n","761/761 [==============================] - 203s 267ms/step - loss: 0.1191 - accuracy: 0.9572 - val_loss: 0.0032 - val_accuracy: 1.0000\n","Epoch 4/5\n","761/761 [==============================] - 200s 262ms/step - loss: 0.0632 - accuracy: 0.9794 - val_loss: 5.5294e-04 - val_accuracy: 1.0000\n","Epoch 5/5\n","761/761 [==============================] - 199s 261ms/step - loss: 0.0506 - accuracy: 0.9834 - val_loss: 6.6896e-04 - val_accuracy: 1.0000\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BQHubBETao5g","executionInfo":{"status":"ok","timestamp":1609667816861,"user_tz":-660,"elapsed":11650,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"}},"outputId":"ae462a92-5f6c-4d1d-b1b7-03061ceeaa08"},"source":["initial_model.evaluate(X_test,y_test)\r\n","initial_preds = initial_model.predict(X_test)\r\n","initial_binarypreds = np.where(initial_preds > .5, 1, 0)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["97/97 [==============================] - 5s 52ms/step - loss: 0.4886 - accuracy: 0.8650\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Jst4Z3OsarHg","executionInfo":{"elapsed":10283,"status":"ok","timestamp":1609547455795,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"712c68f0-9fd8-40fa-dd6e-5a85827bbb75"},"source":["from sklearn.linear_model import LogisticRegressionCV\r\n","clf = LogisticRegressionCV(random_state = 1)\r\n","clf.fit(X_SMOTE,y_SMOTE)\r\n","clf.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.5705920624593364"]},"metadata":{"tags":[]},"execution_count":13}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"9gs0qOBSa3Is","executionInfo":{"elapsed":896,"status":"ok","timestamp":1609547460929,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"5b11321c-5227-408a-b4cc-d538eae25b1f"},"source":["clfpreds = clf.predict(X_test)\r\n","print('Initial Deep Learning Model Classification Report:')\r\n","print(classification_report(y_test, initial_binarypreds))\r\n","print('Initial Logistic Regression Machine Learning Model Classification Report:')\r\n","print(classification_report(y_test, clfpreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Initial Deep Learning Model Classification Report:\n"," precision recall f1-score support\n","\n"," 0 0.72 0.75 0.73 790\n"," 1 0.91 0.90 0.90 2284\n","\n"," accuracy 0.86 3074\n"," macro avg 0.81 0.82 0.82 3074\n","weighted avg 0.86 0.86 0.86 3074\n","\n","Initial Logistic Regression Machine Learning Model Classification Report:\n"," precision recall f1-score support\n","\n"," 0 0.27 0.39 0.32 790\n"," 1 0.75 0.63 0.69 2284\n","\n"," accuracy 0.57 3074\n"," macro avg 0.51 0.51 0.50 3074\n","weighted avg 0.63 0.57 0.59 3074\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"IxQKe_5EgXnk"},"source":["From these results it is clear that the Deep Learning Model is much superior to the Machine Learning Model in both accuracy score on test data and macro average for f-1 score. \r\n","\r\n","Now we will iterate across different class imbalance implementation methods and record their impact on accuracy score and f1 score. This is to satisfy the requirement:'apply at least one additional iteration in the process model involving at least the feature creation task and record impact on model performance.'\r\n","\r\n","We must create a model for each dataset, as using the same model with existing backward propagated weights will not provide a meaningful answer. \r\n","\r\n","model_None refers to a model trained on unaltered data (from a class imbalance perspective)."]},{"cell_type":"code","metadata":{"id":"-LbpU09lZOmE"},"source":["model_None = Sequential()\r\n","model_None.add(Embedding(71537,embedding_vector_features,input_length=sent_length))\r\n","model_None.add(Bidirectional(LSTM(100)))\r\n","model_None.add(Dropout(0.3))\r\n","model_None.add(Dense(1,activation='sigmoid'))\r\n","model_None.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Lm26FPQfZ1N-","executionInfo":{"elapsed":782946,"status":"ok","timestamp":1609549956464,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"0e05d830-d075-4584-b09b-20baa8f61dc3"},"source":["none_Hist = model_None.fit(X_train,y_train,epochs=5,validation_split=.05,batch_size=32)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Epoch 1/5\n","518/518 [==============================] - 162s 306ms/step - loss: 0.4794 - accuracy: 0.7822 - val_loss: 0.3538 - val_accuracy: 0.8496\n","Epoch 2/5\n","518/518 [==============================] - 155s 299ms/step - loss: 0.2211 - accuracy: 0.9207 - val_loss: 0.3028 - val_accuracy: 0.8749\n","Epoch 3/5\n","518/518 [==============================] - 154s 298ms/step - loss: 0.1326 - accuracy: 0.9567 - val_loss: 0.4109 - val_accuracy: 0.8714\n","Epoch 4/5\n","518/518 [==============================] - 154s 298ms/step - loss: 0.0822 - accuracy: 0.9735 - val_loss: 0.4131 - val_accuracy: 0.8794\n","Epoch 5/5\n","518/518 [==============================] - 156s 302ms/step - loss: 0.0537 - accuracy: 0.9825 - val_loss: 0.5139 - val_accuracy: 0.8714\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"OoKcqs0vm_13"},"source":["model_ROS refers to a model trained on data with a random over sampling of the minority class."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"zvCIibbwmPvC","executionInfo":{"elapsed":1641,"status":"ok","timestamp":1609549039221,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"b502cb30-a3fd-4d43-a8d2-886c5d5a17ba"},"source":["ros = RandomOverSampler(random_state=1)\r\n","X_ROS, y_ROS = ros.fit_sample(X_train, y_train)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function safe_indexing is deprecated; safe_indexing is deprecated in version 0.22 and will be removed in version 0.24.\n"," warnings.warn(msg, category=FutureWarning)\n"],"name":"stderr"}]},{"cell_type":"code","metadata":{"id":"Flr4v3FJmVfe"},"source":["model_ROS = Sequential()\r\n","model_ROS.add(Embedding(71537,embedding_vector_features,input_length=sent_length))\r\n","model_ROS.add(Bidirectional(LSTM(100)))\r\n","model_ROS.add(Dropout(0.3))\r\n","model_ROS.add(Dense(1,activation='sigmoid'))\r\n","model_ROS.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"3wqrW5anm4Lk","executionInfo":{"elapsed":1140852,"status":"ok","timestamp":1609551133661,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"b578a7e0-ea2f-4916-fbd2-a9699bd7216a"},"source":["ros_Hist = model_ROS.fit(X_ROS,y_ROS,epochs=5,validation_split=.05,batch_size=32)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Epoch 1/5\n","761/761 [==============================] - 234s 303ms/step - loss: 0.4636 - accuracy: 0.7556 - val_loss: 0.1261 - val_accuracy: 0.9524\n","Epoch 2/5\n","761/761 [==============================] - 225s 296ms/step - loss: 0.1668 - accuracy: 0.9431 - val_loss: 0.1190 - val_accuracy: 0.9649\n","Epoch 3/5\n","761/761 [==============================] - 225s 296ms/step - loss: 0.0808 - accuracy: 0.9758 - val_loss: 0.0709 - val_accuracy: 0.9727\n","Epoch 4/5\n","761/761 [==============================] - 227s 298ms/step - loss: 0.0824 - accuracy: 0.9766 - val_loss: 0.0884 - val_accuracy: 0.9703\n","Epoch 5/5\n","761/761 [==============================] - 228s 300ms/step - loss: 0.0503 - accuracy: 0.9847 - val_loss: 0.0200 - val_accuracy: 0.9914\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"OF9OBFPbrUs2"},"source":["model_ADA refers to a model trained on ADASYN augmented training data"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"AZFyO3nam7xO","executionInfo":{"elapsed":54791,"status":"ok","timestamp":1609551212889,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"443ea66a-e609-4436-9ede-7aead917fa7b"},"source":["ASYN = ADASYN(random_state = 1)\r\n","X_AS, y_AS = ASYN.fit_sample(X_train,y_train)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function safe_indexing is deprecated; safe_indexing is deprecated in version 0.22 and will be removed in version 0.24.\n"," warnings.warn(msg, category=FutureWarning)\n"],"name":"stderr"}]},{"cell_type":"code","metadata":{"id":"qzOoHNhmua0x"},"source":["model_ADA = Sequential()\r\n","model_ADA.add(Embedding(71537,embedding_vector_features,input_length=sent_length))\r\n","model_ADA.add(Bidirectional(LSTM(100)))\r\n","model_ADA.add(Dropout(0.3))\r\n","model_ADA.add(Dense(1,activation='sigmoid'))\r\n","model_ADA.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"2zEgsF2huhCQ","executionInfo":{"elapsed":1115213,"status":"ok","timestamp":1609552344465,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"413a5419-78f9-45a5-dc7f-3c60daf2648f"},"source":["ADA_Hist = model_ADA.fit(X_AS,y_AS,epochs=5,validation_split=.05,batch_size=32) "],"execution_count":null,"outputs":[{"output_type":"stream","text":["Epoch 1/5\n","739/739 [==============================] - 226s 301ms/step - loss: 0.4139 - accuracy: 0.8067 - val_loss: 0.0183 - val_accuracy: 1.0000\n","Epoch 2/5\n","739/739 [==============================] - 223s 301ms/step - loss: 0.1485 - accuracy: 0.9465 - val_loss: 0.0084 - val_accuracy: 0.9992\n","Epoch 3/5\n","739/739 [==============================] - 223s 302ms/step - loss: 0.0865 - accuracy: 0.9710 - val_loss: 0.0035 - val_accuracy: 0.9992\n","Epoch 4/5\n","739/739 [==============================] - 222s 300ms/step - loss: 0.0526 - accuracy: 0.9836 - val_loss: 0.0082 - val_accuracy: 0.9976\n","Epoch 5/5\n","739/739 [==============================] - 221s 299ms/step - loss: 0.0647 - accuracy: 0.9782 - val_loss: 0.0092 - val_accuracy: 0.9984\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"1rAhqEKmrfzt"},"source":["model_SMOTE refers to a model trained on SMOTE augmented training data."]},{"cell_type":"code","metadata":{"id":"mvzqXJLM2mNj"},"source":["model_SMOTE = Sequential()\r\n","model_SMOTE.add(Embedding(71537,embedding_vector_features,input_length=sent_length))\r\n","model_SMOTE.add(Bidirectional(LSTM(100)))\r\n","model_SMOTE.add(Dropout(0.3))\r\n","model_SMOTE.add(Dense(1,activation='sigmoid'))\r\n","model_SMOTE.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"MQ7dQdo72vWS","executionInfo":{"elapsed":1179705,"status":"ok","timestamp":1609558711150,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"5210e947-5205-4347-f6cd-53e418616008"},"source":["SMOTE_Hist = model_SMOTE.fit(X_SMOTE,y_SMOTE,epochs=5,validation_split=.05,batch_size=32) "],"execution_count":null,"outputs":[{"output_type":"stream","text":["Epoch 1/5\n","761/761 [==============================] - 237s 307ms/step - loss: 0.4034 - accuracy: 0.8204 - val_loss: 0.0117 - val_accuracy: 0.9992\n","Epoch 2/5\n","761/761 [==============================] - 230s 303ms/step - loss: 0.1753 - accuracy: 0.9389 - val_loss: 0.0067 - val_accuracy: 1.0000\n","Epoch 3/5\n","761/761 [==============================] - 239s 315ms/step - loss: 0.0894 - accuracy: 0.9697 - val_loss: 0.0037 - val_accuracy: 1.0000\n","Epoch 4/5\n","761/761 [==============================] - 241s 317ms/step - loss: 0.0591 - accuracy: 0.9808 - val_loss: 0.0010 - val_accuracy: 1.0000\n","Epoch 5/5\n","761/761 [==============================] - 231s 304ms/step - loss: 0.0670 - accuracy: 0.9777 - val_loss: 0.0011 - val_accuracy: 1.0000\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"38Gt3VIi6kJo","executionInfo":{"elapsed":816,"status":"ok","timestamp":1609558750623,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"cd99682c-f65a-4962-d231-0c54e372edaf"},"source":["model_SMOTE._name = 'SMOTE'\r\n","model_ADA._name ='ADA'\r\n","model_ROS._name = 'ROS'\r\n","model_None._name ='None'\r\n","print(model_None._name)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["None\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BdZXYN2eJI5C","executionInfo":{"elapsed":45272,"status":"ok","timestamp":1609558799080,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"b644c69a-6d37-4f04-a828-8450115caf37"},"source":["models =[model_None,model_ROS,model_ADA,model_SMOTE]\r\n","for i,model in enumerate(models):\r\n"," print(i+1,'.',model._name,':')\r\n"," model.evaluate(X_test,y_test)\r\n"," model_preds = model.predict(X_test)\r\n"," binary_preds = np.where(model_preds > .5,1,0)\r\n"," print(classification_report(y_test,binary_preds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["1 . None :\n","97/97 [==============================] - 6s 59ms/step - loss: 0.5162 - accuracy: 0.8627\n"," precision recall f1-score support\n","\n"," 0 0.75 0.70 0.72 790\n"," 1 0.90 0.92 0.91 2284\n","\n"," accuracy 0.86 3074\n"," macro avg 0.82 0.81 0.82 3074\n","weighted avg 0.86 0.86 0.86 3074\n","\n","2 . ROS :\n","97/97 [==============================] - 6s 59ms/step - loss: 0.6878 - accuracy: 0.8608\n"," precision recall f1-score support\n","\n"," 0 0.73 0.73 0.73 790\n"," 1 0.91 0.91 0.91 2284\n","\n"," accuracy 0.86 3074\n"," macro avg 0.82 0.82 0.82 3074\n","weighted avg 0.86 0.86 0.86 3074\n","\n","3 . ADA :\n","97/97 [==============================] - 6s 59ms/step - loss: 0.4300 - accuracy: 0.8679\n"," precision recall f1-score support\n","\n"," 0 0.76 0.71 0.73 790\n"," 1 0.90 0.92 0.91 2284\n","\n"," accuracy 0.87 3074\n"," macro avg 0.83 0.82 0.82 3074\n","weighted avg 0.87 0.87 0.87 3074\n","\n","4 . SMOTE :\n","97/97 [==============================] - 6s 60ms/step - loss: 0.5193 - accuracy: 0.8666\n"," precision recall f1-score support\n","\n"," 0 0.77 0.69 0.73 790\n"," 1 0.90 0.93 0.91 2284\n","\n"," accuracy 0.87 3074\n"," macro avg 0.83 0.81 0.82 3074\n","weighted avg 0.86 0.87 0.86 3074\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"u3Vol7c7MGV7"},"source":["From the comparison of different class imbalance implementations, it is clear that the class imbalance tactics do not provide a significant difference to the accuracy of models on test data - the lowest is the ROS model (.8608) and the highest is the ADA model (.8679). Macro average f1 score is .82 for all. \r\n","\r\n","However, just to be sure that no numerical imbalance does not incorrectly drag prediction tendencies towards the majority class on model predictions, SMOTE will be implemented. SMOTE and ADA had very similar results, with ADA about .0013 better on test accuracy, however the generation of synthetics with SMOTE is less computationally intensive than ADA and this makes it favourable in case any further adjustments are required."]},{"cell_type":"code","metadata":{"id":"0q7MP7kha2rh"},"source":["from keras.wrappers.scikit_learn import KerasClassifier"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"mKCX_a1nOQih"},"source":["Now we will tune the Deep Learning model paramters from the default composition we chose. The default composition was embedding vector of 40, lstm of 100 and dropout of .3. We will grid search from embedding vector of [30,40,80], lstm of [50,100,200] and dropout [.1,.3,.5] . If the strongest parameter is one of the extremities, we will perform another grid search in which that extremity becomes the centre value in the grid. \r\n","\r\n","Due to the free version of Google Colab timing out and disconnecting during extensive grid searches, this grid search had to be broken up into multiple smaller grid searches. Note, the results from grid searches were recorded across multiple sessions and the results are cached representations."]},{"cell_type":"code","metadata":{"id":"UBDbVprldOtN"},"source":["def createmodel (embedding_vector,bilstm,drpt):\r\n"," model = Sequential()\r\n"," model.add(Embedding(71537, output_dim= embedding_vector,input_length=sent_length))\r\n"," model.add(Bidirectional(LSTM(bilstm)))\r\n"," model.add(Dropout(drpt))\r\n"," model.add(Dense(1,activation=sigmoid,kernel_initializer='glorot_uniform'))\r\n"," model.compile(optimizer='Adam',loss='binary_crossentropy',metrics = ['accuracy'])\r\n"," return model"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"GLMFqXCCk2ta"},"source":["We have to create a build function, the grid takes each unique combination and feeds it into the build function."]},{"cell_type":"code","metadata":{"id":"ioS8uU4ndRSQ"},"source":["model = KerasClassifier(build_fn=createmodel,verbose=0)\r\n","\r\n","embedding_vector =[30,40]\r\n","bilstm =[50,100]\r\n","drpt =[.3]\r\n","embedding_vector_th =[30,40]\r\n","bilstm_two =[100,200]\r\n","drpt_one =[.15]\r\n","embedding_vector_ei =[40,80]\r\n","bilstm_fif =[50,100]\r\n","drpt_thr =[.3]\r\n","drpt_fi =[.5]"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"HVTO1ryss4es"},"source":["para_grid = dict(embedding_vector=embedding_vector,bilstm=bilstm,drpt=drpt,batch_size=[32],epochs=[5])\r\n","grid = GridSearchCV(estimator = model,param_grid=para_grid,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"zvgZdw8ydUyv","executionInfo":{"elapsed":2235940,"status":"ok","timestamp":1609373358203,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"503c31ab-46c9-4537-811d-f52c4ca98711"},"source":["grid.fit(X_SMOTE,y_SMOTE)\r\n","print(grid.best_score_,grid.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=30, epochs=5, score=0.833, total= 5.4min\n","[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 5.4min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=30, epochs=5, score=0.628, total= 5.3min\n","[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 10.7min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=40, epochs=5, score=0.842, total= 5.7min\n","[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=40, epochs=5, score=0.585, total= 5.7min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=30, epochs=5, score=0.837, total= 7.9min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=30, epochs=5, score=0.648, total= 7.9min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5, score=0.806, total= 8.5min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5, score=0.579, total= 8.4min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 54.9min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.7427980303764343 {'batch_size': 32, 'bilstm': 100, 'drpt': 0.3, 'embedding_vector': 30, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"QahfvxEmdYTp"},"source":["grid1_model = grid.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"fkmvMIR1dhuS","executionInfo":{"elapsed":5737,"status":"ok","timestamp":1609373569305,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"b59c5d69-cce2-49e3-b324-4caa027744e8"},"source":["grid1_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.854261577129364"]},"metadata":{"tags":[]},"execution_count":17}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"wjnPqTEoI-ae","executionInfo":{"elapsed":5678,"status":"ok","timestamp":1609373726123,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"c6400113-dc45-40a7-d9e8-59a11f80ada8"},"source":["g1preds = grid1_model.predict(X_test)\r\n","g1_binarypreds = np.where(g1preds > .5, 1, 0)\r\n","print(classification_report(y_test, g1_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.72 0.71 0.72 790\n"," 1 0.90 0.90 0.90 2284\n","\n"," accuracy 0.85 3074\n"," macro avg 0.81 0.81 0.81 3074\n","weighted avg 0.85 0.85 0.85 3074\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"Vx_7Bbxek6tq"},"source":["For each grid we will fit the grid, return best paramters and best estimator, score the best estimator on test data and print the classification report on test data."]},{"cell_type":"code","metadata":{"id":"dGyhuRbkv8yA"},"source":["para_grid12 = dict(embedding_vector=embedding_vector,bilstm=bilstm,drpt=drpt,batch_size=[32],epochs=[5])\r\n","grid12 = GridSearchCV(estimator = model,param_grid=para_grid12,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"xIE-vFjIv-G5","executionInfo":{"elapsed":4336931,"status":"ok","timestamp":1609388315710,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"b1277017-9fac-4635-cd13-1cd86a17b1a9"},"source":["grid12.fit(X_SMOTE,y_SMOTE)\r\n","print(grid12.best_score_,grid12.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=30, epochs=5, score=0.845, total= 5.4min\n","[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 5.4min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=30, epochs=5, score=0.593, total= 5.5min\n","[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 10.9min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=40, epochs=5, score=0.844, total= 5.8min\n","[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=40, epochs=5, score=0.625, total= 5.8min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=30, epochs=5, score=0.841, total= 8.1min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=30, epochs=5, score=0.630, total= 8.1min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5, score=0.853, total= 8.6min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5, score=0.658, total= 8.5min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 55.8min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.7555234730243683 {'batch_size': 32, 'bilstm': 100, 'drpt': 0.15, 'embedding_vector': 40, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"BD-YOpr2v-Pc"},"source":["grid12_model = grid12.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"TFaRCDiBBhBX","executionInfo":{"elapsed":5088,"status":"ok","timestamp":1609388415758,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"e7e1e3cc-9d90-4f9f-e668-6a1e421b27b2"},"source":["grid1_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.854261577129364"]},"metadata":{"tags":[]},"execution_count":28}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"77Yt33gOBj_4","executionInfo":{"elapsed":5841,"status":"ok","timestamp":1609388445399,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"9adf9a48-9148-47d1-e604-71a6171a6941"},"source":["g12preds = grid12_model.predict(X_test)\r\n","g12_binarypreds = np.where(g12preds > .5, 1, 0)\r\n","print(classification_report(y_test, g12_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.60 0.72 0.65 790\n"," 1 0.90 0.84 0.86 2284\n","\n"," accuracy 0.81 3074\n"," macro avg 0.75 0.78 0.76 3074\n","weighted avg 0.82 0.81 0.81 3074\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"dGP8pZHZCsle"},"source":["para_grid13 = dict(embedding_vector=embedding_vector_th,bilstm=bilstm_two,drpt=drpt_one,batch_size=[32],epochs=[5])\r\n","grid13 = GridSearchCV(estimator = model,param_grid=para_grid13,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"s4tK2aZwCvOz","executionInfo":{"elapsed":10309922,"status":"ok","timestamp":1609399213879,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"5d99cc06-9623-482e-e1bc-58af157dfd4c"},"source":["grid13.fit(X_SMOTE,y_SMOTE)\r\n","print(grid13.best_score_,grid13.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=30, epochs=5, score=0.842, total= 8.0min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 8.0min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=30, epochs=5, score=0.473, total= 8.0min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 16.0min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5, score=0.830, total= 8.6min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5, score=0.539, total= 8.6min\n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=30, epochs=5, score=0.793, total=23.2min\n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=30, epochs=5, score=0.605, total=23.2min\n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=40, epochs=5, score=0.832, total=23.9min\n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=40, epochs=5, score=0.476, total=24.1min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 127.6min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.6987664997577667 {'batch_size': 32, 'bilstm': 200, 'drpt': 0.15, 'embedding_vector': 30, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"BZDoIzSRCvZm"},"source":["grid13_model = grid13.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"PbkTckwegRZF","executionInfo":{"elapsed":18448,"status":"ok","timestamp":1609399332640,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"e0be0f84-32e6-43a4-f2d5-e1d069bcdbd5"},"source":["grid13_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.8649967312812805"]},"metadata":{"tags":[]},"execution_count":33}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"3l0espJggUFA","executionInfo":{"elapsed":31038,"status":"ok","timestamp":1609399349606,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"c0184faa-fd29-4f15-a0c1-77eefd6d823f"},"source":["g13preds = grid13_model.predict(X_test)\r\n","g13_binarypreds = np.where(g13preds > .5, 1, 0)\r\n","print(classification_report(y_test, g13_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.74 0.74 0.74 790\n"," 1 0.91 0.91 0.91 2284\n","\n"," accuracy 0.86 3074\n"," macro avg 0.82 0.82 0.82 3074\n","weighted avg 0.87 0.86 0.87 3074\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"h_sfkyBygesK"},"source":["para_grid14 = dict(embedding_vector=embedding_vector_th,bilstm=bilstm_two,drpt=drpt_thr,batch_size=[32],epochs=[5])\r\n","grid14 = GridSearchCV(estimator = model,param_grid=para_grid14,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"GqzdH-X9gwxX","executionInfo":{"status":"ok","timestamp":1609590515645,"user_tz":-660,"elapsed":3539172,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"}},"outputId":"259f9f23-cf58-4c41-9b7e-36cf31a1ff87"},"source":["grid14.fit(X_SMOTE,y_SMOTE)\r\n","print(grid14.best_score_,grid14.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=30, epochs=5, score=0.844, total= 7.6min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.6min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=30, epochs=5, score=0.408, total= 7.6min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 15.3min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5, score=0.841, total= 8.3min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5, score=0.644, total= 8.1min\n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=30, epochs=5, score=0.841, total=22.8min\n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=30, epochs=5, score=0.573, total=23.2min\n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=40, epochs=5, score=0.804, total=24.2min\n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=40, epochs=5, score=0.465, total=23.8min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 125.6min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.7423686385154724 {'batch_size': 32, 'bilstm': 100, 'drpt': 0.3, 'embedding_vector': 40, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"PnXP6QghD7-0"},"source":["grid14_model = grid14.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"11EvgAKeD_83","executionInfo":{"elapsed":5914,"status":"ok","timestamp":1609408120365,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"dcbc924f-cdf0-43ed-a329-520190abc94b"},"source":["grid14_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.8780090808868408"]},"metadata":{"tags":[]},"execution_count":39}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"SgrFfs8rEDqA","executionInfo":{"elapsed":5785,"status":"ok","timestamp":1609408127566,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"a6a69489-cf23-4ae8-8417-21de3a2b1e38"},"source":["g14preds = grid14_model.predict(X_test)\r\n","g14_binarypreds = np.where(g14preds > .5, 1, 0)\r\n","print(classification_report(y_test, g14_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.78 0.73 0.76 790\n"," 1 0.91 0.93 0.92 2284\n","\n"," accuracy 0.88 3074\n"," macro avg 0.84 0.83 0.84 3074\n","weighted avg 0.88 0.88 0.88 3074\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"ooo4wWicwBmy"},"source":["Highest scores in both categories, likely just a function of randomness as configuration is identical in other mini grids too. "]},{"cell_type":"code","metadata":{"id":"eLbXSFYM1WCa"},"source":["para_grid2 = dict(embedding_vector=embedding_vector,bilstm=bilstm,drpt=drptone,batch_size=[32],epochs=[5])\r\n","grid2 = GridSearchCV(estimator = model,param_grid=para_grid2,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"3Uol_2COKjHK","executionInfo":{"elapsed":9225289,"status":"ok","timestamp":1609383288925,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"d93c87dd-50c8-49c7-929a-02c4393ee41f"},"source":["grid2.fit(X_SMOTE,y_SMOTE)\r\n","print(grid2.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5, score=0.838, total= 8.5min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 8.5min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5, score=0.674, total= 8.6min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=80, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 17.0min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=80, epochs=5, score=0.837, total=10.3min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=80, epochs=5, score=0.593, total=10.2min\n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=40, epochs=5, score=0.843, total=23.5min\n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=40, epochs=5, score=0.473, total=23.6min\n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=80, epochs=5, score=0.837, total=26.2min\n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.15, embedding_vector=80, epochs=5, score=0.461, total=26.3min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 137.2min finished\n"],"name":"stderr"},{"output_type":"stream","text":["{'batch_size': 32, 'bilstm': 100, 'drpt': 0.15, 'embedding_vector': 40, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"QLUOa8MtK1uw"},"source":["grid21_model = grid2.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"_9NcD8pIvZ0I","executionInfo":{"elapsed":6121,"status":"ok","timestamp":1609383676600,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"5a372a8a-b62e-42ef-9488-5de8b5413370"},"source":["grid21_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.8679245114326477"]},"metadata":{"tags":[]},"execution_count":23}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"DoawTGBavfJy","executionInfo":{"elapsed":5892,"status":"ok","timestamp":1609383718200,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"0a382c51-9b19-4970-fd7d-52038a0c1d4b"},"source":["g21preds = grid21_model.predict(X_test)\r\n","g21_binarypreds = np.where(g21preds > .5, 1, 0)\r\n","print(classification_report(y_test, g21_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.73 0.76 0.75 790\n"," 1 0.92 0.90 0.91 2284\n","\n"," accuracy 0.87 3074\n"," macro avg 0.83 0.83 0.83 3074\n","weighted avg 0.87 0.87 0.87 3074\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"oS9z74REvpXU"},"source":["para_grid22 = dict(embedding_vector=embedding_vector_ei,bilstm=bilstm_two,drpt=drpt_thr,batch_size=[32],epochs=[5])\r\n","grid22 = GridSearchCV(estimator = model,param_grid=para_grid22,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"3HAFb3VjDrOL","executionInfo":{"elapsed":7567154,"status":"ok","timestamp":1609419983795,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"eaa0f516-0fba-4527-fd1d-bb4b137ab3a4"},"source":["grid22.fit(X_SMOTE,y_SMOTE)\r\n","print(grid22.best_score_,grid22.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5, score=0.819, total= 6.6min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 6.6min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5, score=0.537, total= 6.5min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=80, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 13.2min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=80, epochs=5, score=0.837, total= 8.1min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=80, epochs=5, score=0.538, total= 8.2min\n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=40, epochs=5, score=0.844, total=18.7min\n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=40, epochs=5, score=0.477, total=18.7min\n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=80, epochs=5, score=0.840, total=21.3min\n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.3, embedding_vector=80, epochs=5, score=0.500, total=21.7min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 109.9min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.6872121095657349 {'batch_size': 32, 'bilstm': 100, 'drpt': 0.3, 'embedding_vector': 80, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"S0UH6o03DrYY"},"source":["grid22_model = grid22.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"tlv1v-2RDryc","executionInfo":{"elapsed":5557,"status":"ok","timestamp":1609420125439,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"e19caa3e-410e-4b37-fe49-1a19006f8ac1"},"source":["grid22_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.8675992488861084"]},"metadata":{"tags":[]},"execution_count":18}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"wUEQr9_B2nro","executionInfo":{"elapsed":4995,"status":"ok","timestamp":1609420143756,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"8b607085-660f-47da-e41b-6b4e37e26505"},"source":["g22preds = grid22_model.predict(X_test)\r\n","g22_binarypreds = np.where(g22preds > .5, 1, 0)\r\n","print(classification_report(y_test, g22_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.74 0.75 0.74 790\n"," 1 0.91 0.91 0.91 2284\n","\n"," accuracy 0.87 3074\n"," macro avg 0.83 0.83 0.83 3074\n","weighted avg 0.87 0.87 0.87 3074\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"8FSM_ved6miW"},"source":["para_grid23 = dict(embedding_vector=embedding_vector_ei,bilstm=bilstm_fif,drpt=drpt_thr,batch_size=[32],epochs=[5])\r\n","grid23 = GridSearchCV(estimator = model,param_grid=para_grid23,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"F14-0-mF7N28","executionInfo":{"elapsed":2627483,"status":"ok","timestamp":1609458570096,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"3742a74f-de32-458f-e3fd-dcb43046e190"},"source":["grid23.fit(X_SMOTE,y_SMOTE)\r\n","print(grid23.best_score_,grid23.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=40, epochs=5, score=0.845, total= 5.8min\n","[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 5.8min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=40, epochs=5, score=0.551, total= 5.8min\n","[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=80, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 11.6min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=80, epochs=5, score=0.827, total= 7.3min\n","[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=50, drpt=0.3, embedding_vector=80, epochs=5, score=0.559, total= 7.3min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5, score=0.803, total= 8.8min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=40, epochs=5, score=0.487, total= 8.9min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=80, epochs=5, score=0.836, total=11.0min\n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.3, embedding_vector=80, epochs=5, score=0.528, total=10.9min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 66.0min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.6977515816688538 {'batch_size': 32, 'bilstm': 50, 'drpt': 0.3, 'embedding_vector': 40, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"62mQ5Ns8L80j"},"source":["grid23_model = grid23.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"aFEQjhme7kqv","executionInfo":{"elapsed":4322,"status":"ok","timestamp":1609458717601,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"47243c2b-c9d3-4472-d26a-52b30db3c420"},"source":["grid23_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.8500325083732605"]},"metadata":{"tags":[]},"execution_count":18}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"JQwXdQOsMCDq","executionInfo":{"elapsed":4218,"status":"ok","timestamp":1609458720853,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"afad8278-1ce0-4501-9639-2a1c64f6d4b5"},"source":["g23preds = grid23_model.predict(X_test)\r\n","g23_binarypreds = np.where(g23preds > .5, 1, 0)\r\n","print(classification_report(y_test, g23_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.69 0.75 0.72 790\n"," 1 0.91 0.89 0.90 2284\n","\n"," accuracy 0.85 3074\n"," macro avg 0.80 0.82 0.81 3074\n","weighted avg 0.85 0.85 0.85 3074\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"w6qBGM_NNw87"},"source":["para_grid24 = dict(embedding_vector=embedding_vector_ei,bilstm=bilstm_fif,drpt=drpt_one,batch_size=[32],epochs=[5])\r\n","grid24 = GridSearchCV(estimator = model,param_grid=para_grid24,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"4emsZhyLOM50","executionInfo":{"elapsed":5020448,"status":"ok","timestamp":1609463985312,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"48601a5f-3040-485b-cd81-c71059c309dc"},"source":["grid24.fit(X_SMOTE,y_SMOTE)\r\n","print(grid24.best_score_,grid24.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=40, epochs=5, score=0.855, total= 5.8min\n","[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 5.8min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=40, epochs=5, score=0.523, total= 5.8min\n","[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=80, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 11.7min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=80, epochs=5, score=0.833, total= 7.4min\n","[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=50, drpt=0.15, embedding_vector=80, epochs=5, score=0.496, total= 7.4min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5, score=0.835, total= 8.9min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=40, epochs=5, score=0.553, total= 8.9min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=80, epochs=5, score=0.823, total=11.1min\n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.15, embedding_vector=80, epochs=5, score=0.417, total=11.1min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 66.5min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.6939651966094971 {'batch_size': 32, 'bilstm': 100, 'drpt': 0.15, 'embedding_vector': 40, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"w5go8yxWOtja"},"source":["grid24_model = grid24.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Iihddcjrd8cH","executionInfo":{"elapsed":6377,"status":"ok","timestamp":1609464032862,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"e526c05d-2dfa-4780-8bec-870db1bb499f"},"source":["grid24_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.8689004778862"]},"metadata":{"tags":[]},"execution_count":23}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Rk8ghf2UeAx8","executionInfo":{"elapsed":6646,"status":"ok","timestamp":1609464053421,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"4c4b8eff-7dbc-40af-998c-97a5d9773da7"},"source":["g24preds = grid24_model.predict(X_test)\r\n","g24_binarypreds = np.where(g24preds > .5, 1, 0)\r\n","print(classification_report(y_test, g24_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.77 0.71 0.73 790\n"," 1 0.90 0.93 0.91 2284\n","\n"," accuracy 0.87 3074\n"," macro avg 0.83 0.82 0.82 3074\n","weighted avg 0.87 0.87 0.87 3074\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"8zKaeCt3iGQG"},"source":["para_grid31 = dict(embedding_vector=embedding_vector_ei,bilstm=bilstm_fif,drpt=drpt_fi,batch_size=[32],epochs=[5])\r\n","grid31 = GridSearchCV(estimator = model,param_grid=para_grid31,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"vd4-acTwuHnX","executionInfo":{"elapsed":4666076,"status":"ok","timestamp":1609471937330,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"2cb6051d-9e98-4cea-d73a-cd89e99553fa"},"source":["grid31.fit(X_SMOTE,y_SMOTE)\r\n","print(grid31.best_score_,grid31.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=40, epochs=5, score=0.854, total= 5.9min\n","[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 5.9min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=40, epochs=5, score=0.495, total= 5.8min\n","[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=80, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 11.7min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=80, epochs=5, score=0.850, total= 7.4min\n","[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=80, epochs=5, score=0.455, total= 7.5min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5, score=0.815, total= 9.0min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5, score=0.485, total= 9.0min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=80, epochs=5, score=0.826, total=11.0min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=80, epochs=5, score=0.430, total=11.0min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 66.6min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.6745647639036179 {'batch_size': 32, 'bilstm': 50, 'drpt': 0.5, 'embedding_vector': 40, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"JLMVWbmQuRFH"},"source":["grid31_model = grid31.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"E5IMi9IL7zxi","executionInfo":{"elapsed":4891,"status":"ok","timestamp":1609471963649,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"1e25c266-05e0-4cd4-d551-659cabc2a6ba"},"source":["grid31_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.8633701801300049"]},"metadata":{"tags":[]},"execution_count":32}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"7q4GoALL72lU","executionInfo":{"elapsed":4527,"status":"ok","timestamp":1609471969881,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"0286632b-545e-492d-e2fb-8b5bb0fa05f8"},"source":["g31preds = grid31_model.predict(X_test)\r\n","g31_binarypreds = np.where(g31preds > .5, 1, 0)\r\n","print(classification_report(y_test, g31_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.76 0.68 0.72 790\n"," 1 0.89 0.93 0.91 2284\n","\n"," accuracy 0.86 3074\n"," macro avg 0.83 0.80 0.81 3074\n","weighted avg 0.86 0.86 0.86 3074\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"1ktTlzdxATgQ"},"source":["para_grid32 = dict(embedding_vector=embedding_vector_ei,bilstm=bilstm_two,drpt=drpt_fi,batch_size=[32],epochs=[5])\r\n","grid32 = GridSearchCV(estimator = model,param_grid=para_grid32,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"VRv4gnMmAUEW","executionInfo":{"elapsed":818824,"status":"ok","timestamp":1609488888818,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"2ca86a6d-cf40-40ed-82c6-6d24231a2470"},"source":["grid32.fit(X_SMOTE,y_SMOTE)\r\n","print(grid32.best_score_,grid32.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5, score=0.849, total= 9.1min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 9.1min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5, score=0.533, total= 9.1min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=80, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 18.3min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=80, epochs=5, score=0.800, total=11.5min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=80, epochs=5, score=0.570, total=11.6min\n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=40, epochs=5, score=0.839, total=27.0min\n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=40, epochs=5, score=0.748, total=27.3min\n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=80, epochs=5, score=0.828, total=31.3min\n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=80, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=80, epochs=5, score=0.410, total=30.5min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 157.4min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.7931532561779022 {'batch_size': 32, 'bilstm': 200, 'drpt': 0.5, 'embedding_vector': 40, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"WWRxvQfk1aCK"},"source":["grid32_model = grid32.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"ZFexl6RL1aYs","executionInfo":{"elapsed":20189,"status":"ok","timestamp":1609489714486,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"e031f662-602d-45a3-f0ff-b10bf4cee4a1"},"source":["grid32_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.8529602885246277"]},"metadata":{"tags":[]},"execution_count":38}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"c4qNm4xFAaI9","executionInfo":{"elapsed":36920,"status":"ok","timestamp":1609489734464,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"a1b7abf0-91a7-492a-8e49-649057b36ae9"},"source":["g32preds = grid32_model.predict(X_test)\r\n","g32_binarypreds = np.where(g32preds > .5, 1, 0)\r\n","print(classification_report(y_test, g32_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.73 0.69 0.71 790\n"," 1 0.89 0.91 0.90 2284\n","\n"," accuracy 0.85 3074\n"," macro avg 0.81 0.80 0.80 3074\n","weighted avg 0.85 0.85 0.85 3074\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"64kM9l3n1qox"},"source":["para_grid33 = dict(embedding_vector=embedding_vector_th,bilstm=bilstm_fif,drpt=drpt_fi,batch_size=[32],epochs=[5])\r\n","grid33 = GridSearchCV(estimator = model,param_grid=para_grid33,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"aKbsMt4m2JB1","executionInfo":{"elapsed":603007,"status":"ok","timestamp":1609494165329,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"3535aecc-330a-4377-dccb-23b8d96eedc7"},"source":["grid33.fit(X_SMOTE,y_SMOTE)\r\n","print(grid33.best_score_,grid33.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=30, epochs=5, score=0.841, total= 5.8min\n","[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 5.8min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=30, epochs=5, score=0.446, total= 5.8min\n","[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 11.6min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=40, epochs=5, score=0.841, total= 6.3min\n","[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=50, drpt=0.5, embedding_vector=40, epochs=5, score=0.730, total= 6.3min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=30, epochs=5, score=0.857, total= 9.0min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=30, epochs=5, score=0.526, total= 8.9min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5, score=0.795, total= 9.5min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5, score=0.515, total= 9.5min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 61.2min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.7852291166782379 {'batch_size': 32, 'bilstm': 50, 'drpt': 0.5, 'embedding_vector': 40, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"YBSu2ImvENqc"},"source":["grid33_model = grid33.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"5PrI5dY_Eby2","executionInfo":{"elapsed":6276,"status":"ok","timestamp":1609494185771,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"9f0caf7a-dbe0-4732-c15c-3e0c2ea37ae0"},"source":["grid33_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.8731294870376587"]},"metadata":{"tags":[]},"execution_count":43}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"QaXoPWbREeFQ","executionInfo":{"elapsed":5454,"status":"ok","timestamp":1609494220090,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"ef28bb1f-922d-4352-8c90-d4383b3b9a0e"},"source":["g33preds = grid33_model.predict(X_test)\r\n","g33_binarypreds = np.where(g33preds > .5, 1, 0)\r\n","print(classification_report(y_test, g33_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.76 0.75 0.75 790\n"," 1 0.91 0.92 0.91 2284\n","\n"," accuracy 0.87 3074\n"," macro avg 0.83 0.83 0.83 3074\n","weighted avg 0.87 0.87 0.87 3074\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"4dhPwzDSS1Lh"},"source":["para_grid34 = dict(embedding_vector=embedding_vector_th,bilstm=bilstm_two,drpt=drpt_fi,batch_size=[32],epochs=[5])\r\n","grid34 = GridSearchCV(estimator = model,param_grid=para_grid34,cv=2,verbose=3)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"NhkxDs5-TUkZ","executionInfo":{"elapsed":1013591,"status":"ok","timestamp":1609506271865,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"293c154b-706c-49cb-a9bb-f7ab01b8b601"},"source":["grid34.fit(X_SMOTE,y_SMOTE)\r\n","print(grid34.best_score_,grid34.best_params_)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting 2 folds for each of 4 candidates, totalling 8 fits\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=30, epochs=5, score=0.839, total= 7.0min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=30, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.0min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=30, epochs=5, score=0.480, total= 7.0min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5 \n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 13.9min remaining: 0.0s\n"],"name":"stderr"},{"output_type":"stream","text":["[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5, score=0.815, total= 7.4min\n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=100, drpt=0.5, embedding_vector=40, epochs=5, score=0.497, total= 7.4min\n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=30, epochs=5, score=0.823, total=20.0min\n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=30, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=30, epochs=5, score=0.616, total=19.3min\n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=40, epochs=5, score=0.807, total=18.0min\n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=40, epochs=5 \n","[CV] batch_size=32, bilstm=200, drpt=0.5, embedding_vector=40, epochs=5, score=0.560, total=18.1min\n"],"name":"stdout"},{"output_type":"stream","text":["[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 104.0min finished\n"],"name":"stderr"},{"output_type":"stream","text":["0.7193769812583923 {'batch_size': 32, 'bilstm': 200, 'drpt': 0.5, 'embedding_vector': 30, 'epochs': 5}\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"TMMe_UFcVUBo"},"source":["grid34_model = grid34.best_estimator_"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"bZHlyyU_7ZVs","executionInfo":{"elapsed":14896,"status":"ok","timestamp":1609506391117,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"75844e89-7b4a-4486-e53d-9098ebba5fac"},"source":["grid34_model.score(X_test,y_test)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.8464541435241699"]},"metadata":{"tags":[]},"execution_count":20}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BwZSGXEk7cN4","executionInfo":{"elapsed":14340,"status":"ok","timestamp":1609506410700,"user":{"displayName":"Marcus Wimalajeewa","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Ghm8LatOV6UtVcWHSLOpyN9N_oI2fMOjJrXyMK9vw=s64","userId":"06041045911550730099"},"user_tz":-660},"outputId":"5f91d270-26a1-4ec2-c083-d4f709522aa3"},"source":["g34preds = grid34_model.predict(X_test)\r\n","g34_binarypreds = np.where(g34preds > .5, 1, 0)\r\n","print(classification_report(y_test, g34_binarypreds))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py:450: UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype(\"int32\")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).\n"," warnings.warn('`model.predict_classes()` is deprecated and '\n"],"name":"stderr"},{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," 0 0.68 0.77 0.72 790\n"," 1 0.92 0.87 0.89 2284\n","\n"," accuracy 0.85 3074\n"," macro avg 0.80 0.82 0.81 3074\n","weighted avg 0.85 0.85 0.85 3074\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"7Ap-mlGFtogR"},"source":["From repeated Grids it seems that Grid14 was the strongest candidate, as it scored the highest accuracy on test data,.878, and macro average f1 score of .84. Grid 14's configuration was an embedding vector 0f 40, a bidirectional lstm layer with 100 units and a dropout rate of .3. This is configuration was present in other grids so there is likely some function of randomness at play. Still in, counting the best paramaters across all the grids, the grid 14 confiugration occurs most frequently. Due to issues with weight initialization in loading saved models into a new session, we will simply train a model with this configuration, rather than save grid14. "]}]}
@marcusmw
Copy link
Author

If gist fails to render ('something went wrong') google colab link: https://colab.research.google.com/drive/1yLDPwEo_yxRrlFfaPjWx8Lojz0muvsT6?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment