Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save akashpalrecha/1f636e3a82f6e7f9802656d31c96f477 to your computer and use it in GitHub Desktop.
Save akashpalrecha/1f636e3a82f6e7f9802656d31c96f477 to your computer and use it in GitHub Desktop.
Kaggle Toxic Comments Classification Challenge using Fastai V1 and ULMFIT
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%reload_ext autoreload\n",
"%autoreload 2\n",
"%matplotlib inline\n",
"\n",
"from fastai.text import *\n",
"from fastai import *\n",
"from sklearn.metrics import roc_auc_score"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# WARNING :\n",
"***THE NOTEBOOK MAY CONTAIN OBSCENE AND INAPPROPRIATE CONTENT BECAUSE OF THE NATURE OF THE DATASET BEING USED. READER DISCRETION IS ADVISED.***"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Language Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing Data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# PATH = Path('data/Data_Processed/Data_Classifier').absolute()\n",
"PATH = Path('data/').absolute()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/models'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/classifier_train_data'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/kaggle_submission_1.csv'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/test_labels.csv'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/sample_submission.csv'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/train.csv'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/test.csv')]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"PATH.ls()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For training the language model, we use both the `train` and `test` data. This is because the label for each word is the next word in the sequence and the test data can be used in this case. We ignore the classification labels and only use the column containing the text for the language model."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>comment_text</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Explanation\\nWhy the edits made under my usern...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>D'aww! He matches this background colour I'm s...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Hey man, I'm really not trying to edit war. It...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>\"\\nMore\\nI can't make any real suggestions on ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>You, sir, are my hero. Any chance you remember...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" comment_text\n",
"0 Explanation\\nWhy the edits made under my usern...\n",
"1 D'aww! He matches this background colour I'm s...\n",
"2 Hey man, I'm really not trying to edit war. It...\n",
"3 \"\\nMore\\nI can't make any real suggestions on ...\n",
"4 You, sir, are my hero. Any chance you remember..."
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train = pd.DataFrame(pd.read_csv(PATH/'train.csv')['comment_text'])\n",
"test = pd.DataFrame(pd.read_csv(PATH/'test.csv')['comment_text'])\n",
"df_lm = train.append(test)\n",
"\n",
"df_lm.head()"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"bs = 48"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"data_lm = (TextList.from_df(train, path=PATH, cols='comment_text')\n",
" .split_by_rand_pct(0.1)\n",
" .label_for_lm()\n",
" .databunch(bs=bs))"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>idx</th>\n",
" <th>text</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ) xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>\\n face all sorts of physical and legal threats , both on- and off - wiki , as well as real - life stalking over prolonged periods by multiple people \\n \\n xxmaj all with little or no support from the xxup wmf and large swathes of the community who are more interested in the never - ending tendentious discussion of \" \" meta \" \" issues -</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>utc ) \\n i think she xxunk the show and / or what callers are saying since xxmaj rush has a hearing problem , as you must well know . \\n xxmaj plus she probably searches the web for current news in certain categories . 05:24 , 7 xxmaj march 2010 xxbos \" \\n i 'd maintain it was only a personal attack , the user i</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>\\n person : xxup xxunk xxmaj global xxup ip - xxmaj addressing \\n address : xxmaj deutsche xxmaj telekom xxup ag \\n address : xxup xxunk xxmaj xxunk \\n address : xxmaj germany \\n phone : 49 180 xxunk \\n fax - no : 49 180 xxunk \\n e - mail : xxunk xxbos i liked it too and it inspired me to</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>xxmaj german state ; xxmaj germany as an empire ( the xxmaj first xxmaj reich ) . . . and xxmaj third xxmaj reich as a dictatorship under the xxmaj nazi regime . . . ) \" xxbos xxmaj which one of us is having trouble with xxmaj wiki is not a xxmaj battlefield ? xxbos xxmaj thanks for clarifying . xxbos xxmaj this talk page is full of xxunk</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_lm.show_batch()"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"data_lm.save('tmp_lm_model')"
]
},
{
"cell_type": "code",
"execution_count": 185,
"metadata": {},
"outputs": [],
"source": [
"data_lm = load_data(PATH, 'tmp_lm_model', bs=bs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing the Language model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we load a pretrained model trained on the Wikitext 103 dataset."
]
},
{
"cell_type": "code",
"execution_count": 186,
"metadata": {},
"outputs": [],
"source": [
"learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def find_lr(model=learn):\n",
" \"\"\"\n",
" Convenience function to perform learning rate range test and plot results.\n",
" \"\"\"\n",
" model.lr_find()\n",
" model.recorder.plot(skip_end=15)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"find_lr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It looks like the loss is still decreasing steeply at a learning rate of about `3e-2`.\n",
"\n",
"We first fit the head of the model before unfreezing it."
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>4.330651</td>\n",
" <td>4.091124</td>\n",
" <td>0.307680</td>\n",
" <td>12:01</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(1, 3e-2, moms=(0.8, 0.7))"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"learn.save('fit_head')"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": true
},
"outputs": [
{
"data": {
"text/plain": [
"LanguageLearner(data=TextLMDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: LMTextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: LMLabelList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: LMTextList\n",
"xxbos would continue to be , delayed due to my xxup irl schedule .,xxbos xxmaj get xxmaj well \n",
" \n",
" xxmaj get well soon . ),xxbos \" \n",
" \n",
" xxmaj gerry xxmaj xxunk \n",
" xxmaj hey , amigo , take the protect off \" \" xxmaj gerry xxmaj xxunk . \" \" i think you 're letting your sexual preference cloud your rational thought . xxmaj the edits you have called \" \" vandalism \" \" ( not done by me ) are nowhere near vandalism . xxmaj what or who are you trying to protect . xxmaj unlock it now and lighten up . xxmaj you 've been an admin for too long to act in this juvenile way . xxmaj stop showing favoritism and be even - handed . xxmaj now ! ! ! rossp \",xxbos xxup hd xxup dvd winning ? \n",
" \n",
" xxmaj the section previously stated that on xxmaj january 8 , 2006 , at the xxmaj consumer xxmaj electronics xxmaj show , it had been said that hd dvd was ahead of xxunk . however they did n't cite it . i looked and the article i cited said they were neck and neck . \n",
" \n",
" xxup ok , my reasoning on the cnn article was probably weak , but it still may be true . \n",
" xxmaj xxunk 's website claims to have passed hd dvd , but i doubt that that is a very reliable place to look for info . xxmaj can someone find something else or revert it back ? ( i do n't know how to do that ) \n",
" \n",
" \n",
" xxup hd - xxup dvd has the backing of the porn industry , so its practically won already . \n",
" \n",
" xxmaj blu xxmaj ray fanboy stay out \n",
" \n",
" xxmaj some people keep changing articles with no proof . xxmaj stop it . —the preceding unsigned comment was added by ( talk • contribs ) . \n",
" \n",
" xxmaj like yourself . 17 xxup gb per layer discs and triple - layer discs are not part of xxup hd - xxup dvd spec and as such are xxup possible xxup future additions that have no annouced released date , no schedule for when they may be released , and are 99 % likely to be incompatible with the optical heads in current and near future drives . xxmaj if you have some proof proving otherwise , then by all means add the 51 xxup gb disc back in and xxup refference it .,xxbos xxmaj terrorism \n",
" \n",
" xxmaj would it be possible for me to edit the terrorism article to add a section on state sponsored terrorism that is unrelated to the current dispute ?\n",
"y: LMLabelList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" (1): LinearDecoder(\n",
" (decoder): Linear(in_features=400, out_features=60004, bias=True)\n",
" (output_dp): RNNDropout()\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of CrossEntropyLoss(), metrics=[<function accuracy at 0x7faa93d3c1e0>], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[RNNTrainer\n",
"learn: LanguageLearner(data=TextLMDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: LMTextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: LMLabelList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: LMTextList\n",
"xxbos would continue to be , delayed due to my xxup irl schedule .,xxbos xxmaj get xxmaj well \n",
" \n",
" xxmaj get well soon . ),xxbos \" \n",
" \n",
" xxmaj gerry xxmaj xxunk \n",
" xxmaj hey , amigo , take the protect off \" \" xxmaj gerry xxmaj xxunk . \" \" i think you 're letting your sexual preference cloud your rational thought . xxmaj the edits you have called \" \" vandalism \" \" ( not done by me ) are nowhere near vandalism . xxmaj what or who are you trying to protect . xxmaj unlock it now and lighten up . xxmaj you 've been an admin for too long to act in this juvenile way . xxmaj stop showing favoritism and be even - handed . xxmaj now ! ! ! rossp \",xxbos xxup hd xxup dvd winning ? \n",
" \n",
" xxmaj the section previously stated that on xxmaj january 8 , 2006 , at the xxmaj consumer xxmaj electronics xxmaj show , it had been said that hd dvd was ahead of xxunk . however they did n't cite it . i looked and the article i cited said they were neck and neck . \n",
" \n",
" xxup ok , my reasoning on the cnn article was probably weak , but it still may be true . \n",
" xxmaj xxunk 's website claims to have passed hd dvd , but i doubt that that is a very reliable place to look for info . xxmaj can someone find something else or revert it back ? ( i do n't know how to do that ) \n",
" \n",
" \n",
" xxup hd - xxup dvd has the backing of the porn industry , so its practically won already . \n",
" \n",
" xxmaj blu xxmaj ray fanboy stay out \n",
" \n",
" xxmaj some people keep changing articles with no proof . xxmaj stop it . —the preceding unsigned comment was added by ( talk • contribs ) . \n",
" \n",
" xxmaj like yourself . 17 xxup gb per layer discs and triple - layer discs are not part of xxup hd - xxup dvd spec and as such are xxup possible xxup future additions that have no annouced released date , no schedule for when they may be released , and are 99 % likely to be incompatible with the optical heads in current and near future drives . xxmaj if you have some proof proving otherwise , then by all means add the 51 xxup gb disc back in and xxup refference it .,xxbos xxmaj terrorism \n",
" \n",
" xxmaj would it be possible for me to edit the terrorism article to add a section on state sponsored terrorism that is unrelated to the current dispute ?\n",
"y: LMLabelList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" (1): LinearDecoder(\n",
" (decoder): Linear(in_features=400, out_features=60004, bias=True)\n",
" (output_dp): RNNDropout()\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of CrossEntropyLoss(), metrics=[<function accuracy at 0x7faa93d3c1e0>], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[...], layer_groups=[Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (2): LinearDecoder(\n",
" (decoder): Linear(in_features=400, out_features=60004, bias=True)\n",
" (output_dp): RNNDropout()\n",
" )\n",
")], add_time=True, silent=None)\n",
"alpha: 2.0\n",
"beta: 1.0], layer_groups=[Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (2): LinearDecoder(\n",
" (decoder): Linear(in_features=400, out_features=60004, bias=True)\n",
" (output_dp): RNNDropout()\n",
" )\n",
")], add_time=True, silent=None)"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.load('fit_head')"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>text</th>\n",
" <th>target</th>\n",
" <th>pred</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>xxbos would continue to be , delayed due to my xxup irl schedule . xxbos xxmaj get xxmaj well \\n</td>\n",
" <td>\\n xxmaj get well soon . ) xxbos \" \\n \\n xxmaj gerry xxmaj xxunk \\n xxmaj</td>\n",
" <td>xxmaj hi the done , xxmaj xxmaj xxmaj \\n \\n xxmaj the xxmaj adams \\n \\n xxmaj hi</td>\n",
" </tr>\n",
" <tr>\n",
" <td>though , so please do nt delete cited factual information and i wo nt have to ... \" xxbos )</td>\n",
" <td>\\n \\n xxmaj for now i will leave your edit until i find proper sources to back myself .</td>\n",
" <td>\\n \\n xxmaj the example , 've be a message to i can that references . be up .</td>\n",
" </tr>\n",
" <tr>\n",
" <td>xxup rs now . if any editors can specify some content they believe should remain or possibly be merged to</td>\n",
" <td>the fear mongering article , make your suggestions here . cheers xxbos xxup redirect xxmaj talk : xxmaj history of</td>\n",
" <td>the article of , , you sure own on . xxmaj . xxmaj redirect xxmaj talk : xxmaj xxunk of</td>\n",
" </tr>\n",
" <tr>\n",
" <td>this great business . xxmaj it will be xxmaj wikipedia : wikiproject xxmaj music of wrestling . xxmaj join if</td>\n",
" <td>you are remotely interested - every little helps ! xxmaj thanks . xxbos xxmaj of course i am sorry ,</td>\n",
" <td>you want interested related in xxmaj time welcome to xxmaj please . xxmaj xxmaj the course , am not .</td>\n",
" </tr>\n",
" <tr>\n",
" <td>you created , xxmaj xxunk xxmaj villa xxmaj xxunk xxmaj soccer xxmaj club , has been tagged for deletion ,</td>\n",
" <td>as it meets one or more of the criteria for speedy deletion ; specifically , it serves only to attack</td>\n",
" <td>and you is the of more of the criteria for speedy deletion . it , the is as as prevent</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.show_results()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**We will now unfreeze the whole model to fine tune al the weights for this particular dataset.**"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [],
"source": [
"learn.unfreeze()"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>3.805411</td>\n",
" <td>3.787183</td>\n",
" <td>0.341519</td>\n",
" <td>13:29</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>3.590235</td>\n",
" <td>3.664578</td>\n",
" <td>0.357810</td>\n",
" <td>13:30</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>3.558987</td>\n",
" <td>3.597754</td>\n",
" <td>0.366857</td>\n",
" <td>13:27</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>3.434307</td>\n",
" <td>3.562573</td>\n",
" <td>0.371835</td>\n",
" <td>13:27</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>3.357494</td>\n",
" <td>3.556864</td>\n",
" <td>0.372336</td>\n",
" <td>13:27</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(5, 1e-3, moms=(0.8, 0.7))"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {},
"outputs": [],
"source": [
"learn.save('fine_tuned')"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>3.424021</td>\n",
" <td>3.561982</td>\n",
" <td>0.371912</td>\n",
" <td>13:28</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>3.436336</td>\n",
" <td>3.554538</td>\n",
" <td>0.372866</td>\n",
" <td>13:31</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>3.403637</td>\n",
" <td>3.539753</td>\n",
" <td>0.374899</td>\n",
" <td>13:28</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>3.341969</td>\n",
" <td>3.531037</td>\n",
" <td>0.376424</td>\n",
" <td>13:30</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>3.260996</td>\n",
" <td>3.532155</td>\n",
" <td>0.376360</td>\n",
" <td>13:27</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(5, 5e-4, moms=(0.8, 0.7))"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [],
"source": [
"learn.save('fine_tuned')"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {
"collapsed": true
},
"outputs": [
{
"data": {
"text/plain": [
"LanguageLearner(data=TextLMDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: LMTextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: LMLabelList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: LMTextList\n",
"xxbos would continue to be , delayed due to my xxup irl schedule .,xxbos xxmaj get xxmaj well \n",
" \n",
" xxmaj get well soon . ),xxbos \" \n",
" \n",
" xxmaj gerry xxmaj xxunk \n",
" xxmaj hey , amigo , take the protect off \" \" xxmaj gerry xxmaj xxunk . \" \" i think you 're letting your sexual preference cloud your rational thought . xxmaj the edits you have called \" \" vandalism \" \" ( not done by me ) are nowhere near vandalism . xxmaj what or who are you trying to protect . xxmaj unlock it now and lighten up . xxmaj you 've been an admin for too long to act in this juvenile way . xxmaj stop showing favoritism and be even - handed . xxmaj now ! ! ! rossp \",xxbos xxup hd xxup dvd winning ? \n",
" \n",
" xxmaj the section previously stated that on xxmaj january 8 , 2006 , at the xxmaj consumer xxmaj electronics xxmaj show , it had been said that hd dvd was ahead of xxunk . however they did n't cite it . i looked and the article i cited said they were neck and neck . \n",
" \n",
" xxup ok , my reasoning on the cnn article was probably weak , but it still may be true . \n",
" xxmaj xxunk 's website claims to have passed hd dvd , but i doubt that that is a very reliable place to look for info . xxmaj can someone find something else or revert it back ? ( i do n't know how to do that ) \n",
" \n",
" \n",
" xxup hd - xxup dvd has the backing of the porn industry , so its practically won already . \n",
" \n",
" xxmaj blu xxmaj ray fanboy stay out \n",
" \n",
" xxmaj some people keep changing articles with no proof . xxmaj stop it . —the preceding unsigned comment was added by ( talk • contribs ) . \n",
" \n",
" xxmaj like yourself . 17 xxup gb per layer discs and triple - layer discs are not part of xxup hd - xxup dvd spec and as such are xxup possible xxup future additions that have no annouced released date , no schedule for when they may be released , and are 99 % likely to be incompatible with the optical heads in current and near future drives . xxmaj if you have some proof proving otherwise , then by all means add the 51 xxup gb disc back in and xxup refference it .,xxbos xxmaj terrorism \n",
" \n",
" xxmaj would it be possible for me to edit the terrorism article to add a section on state sponsored terrorism that is unrelated to the current dispute ?\n",
"y: LMLabelList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" (1): LinearDecoder(\n",
" (decoder): Linear(in_features=400, out_features=60004, bias=True)\n",
" (output_dp): RNNDropout()\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of CrossEntropyLoss(), metrics=[<function accuracy at 0x7faa93d3c1e0>], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[RNNTrainer\n",
"learn: LanguageLearner(data=TextLMDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: LMTextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: LMLabelList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: LMTextList\n",
"xxbos would continue to be , delayed due to my xxup irl schedule .,xxbos xxmaj get xxmaj well \n",
" \n",
" xxmaj get well soon . ),xxbos \" \n",
" \n",
" xxmaj gerry xxmaj xxunk \n",
" xxmaj hey , amigo , take the protect off \" \" xxmaj gerry xxmaj xxunk . \" \" i think you 're letting your sexual preference cloud your rational thought . xxmaj the edits you have called \" \" vandalism \" \" ( not done by me ) are nowhere near vandalism . xxmaj what or who are you trying to protect . xxmaj unlock it now and lighten up . xxmaj you 've been an admin for too long to act in this juvenile way . xxmaj stop showing favoritism and be even - handed . xxmaj now ! ! ! rossp \",xxbos xxup hd xxup dvd winning ? \n",
" \n",
" xxmaj the section previously stated that on xxmaj january 8 , 2006 , at the xxmaj consumer xxmaj electronics xxmaj show , it had been said that hd dvd was ahead of xxunk . however they did n't cite it . i looked and the article i cited said they were neck and neck . \n",
" \n",
" xxup ok , my reasoning on the cnn article was probably weak , but it still may be true . \n",
" xxmaj xxunk 's website claims to have passed hd dvd , but i doubt that that is a very reliable place to look for info . xxmaj can someone find something else or revert it back ? ( i do n't know how to do that ) \n",
" \n",
" \n",
" xxup hd - xxup dvd has the backing of the porn industry , so its practically won already . \n",
" \n",
" xxmaj blu xxmaj ray fanboy stay out \n",
" \n",
" xxmaj some people keep changing articles with no proof . xxmaj stop it . —the preceding unsigned comment was added by ( talk • contribs ) . \n",
" \n",
" xxmaj like yourself . 17 xxup gb per layer discs and triple - layer discs are not part of xxup hd - xxup dvd spec and as such are xxup possible xxup future additions that have no annouced released date , no schedule for when they may be released , and are 99 % likely to be incompatible with the optical heads in current and near future drives . xxmaj if you have some proof proving otherwise , then by all means add the 51 xxup gb disc back in and xxup refference it .,xxbos xxmaj terrorism \n",
" \n",
" xxmaj would it be possible for me to edit the terrorism article to add a section on state sponsored terrorism that is unrelated to the current dispute ?\n",
"y: LMLabelList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" (1): LinearDecoder(\n",
" (decoder): Linear(in_features=400, out_features=60004, bias=True)\n",
" (output_dp): RNNDropout()\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of CrossEntropyLoss(), metrics=[<function accuracy at 0x7faa93d3c1e0>], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed/Data_Classifier'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[...], layer_groups=[Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (2): LinearDecoder(\n",
" (decoder): Linear(in_features=400, out_features=60004, bias=True)\n",
" (output_dp): RNNDropout()\n",
" )\n",
")], add_time=True, silent=None)\n",
"alpha: 2.0\n",
"beta: 1.0], layer_groups=[Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (2): LinearDecoder(\n",
" (decoder): Linear(in_features=400, out_features=60004, bias=True)\n",
" (output_dp): RNNDropout()\n",
" )\n",
")], add_time=True, silent=None)"
]
},
"execution_count": 188,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.load('fine_tuned')"
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"I like to see you return and look for you . The Red Pen of\n",
"I like to know what the hell have you done , but i did n't much of that\n",
"I like to be careful of the proper style of the British and Irish Isles\n",
"I like to incorporate some of the information into the article . In this case , the\n",
"I like to see how i can get a chance to respond . xxbos While i appreciate\n"
]
}
],
"source": [
"print('\\n'.join(learn.predict(\"I like to\", n_words=15, temperature=0.75) for i in range(5)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see, the model has somewhat of an understanding of grammar and of how an opinion is expressed.<br>\n",
"It also has an understanding of how possessive pronouns work.<br>\n",
"Also, the model seems to have learnt that the comments generally have a speaker that is expressing an opinion and that it is frequently addressed to some other people. This seems enough information to go forward with training the classifier."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The encoder of this model will be used to train a classifier on the dataset."
]
},
{
"cell_type": "code",
"execution_count": 189,
"metadata": {},
"outputs": [],
"source": [
"learn.save_encoder('fine_tuned_enc')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training the Classifier"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"## Preparing the classification dataset"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"PATH = Path('data/').absolute()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/models'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/classifier_train_data'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/Data_Processed'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/test_labels.csv'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/sample_submission.csv'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/train.csv'),\n",
" PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data/test.csv')]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"PATH.ls()"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"df = pd.read_csv(PATH/'train.csv').drop('id', axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>comment_text</th>\n",
" <th>toxic</th>\n",
" <th>severe_toxic</th>\n",
" <th>obscene</th>\n",
" <th>threat</th>\n",
" <th>insult</th>\n",
" <th>identity_hate</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Explanation\\nWhy the edits made under my usern...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>D'aww! He matches this background colour I'm s...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Hey man, I'm really not trying to edit war. It...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>\"\\nMore\\nI can't make any real suggestions on ...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>You, sir, are my hero. Any chance you remember...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" comment_text toxic severe_toxic \\\n",
"0 Explanation\\nWhy the edits made under my usern... 0 0 \n",
"1 D'aww! He matches this background colour I'm s... 0 0 \n",
"2 Hey man, I'm really not trying to edit war. It... 0 0 \n",
"3 \"\\nMore\\nI can't make any real suggestions on ... 0 0 \n",
"4 You, sir, are my hero. Any chance you remember... 0 0 \n",
"\n",
" obscene threat insult identity_hate \n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 "
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"bs = 48"
]
},
{
"cell_type": "code",
"execution_count": 179,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"data_clas = (TextList.from_csv(path=PATH, csv_name='train.csv', cols='comment_text', vocab=data_lm.vocab)\n",
" .split_by_rand_pct(0.1)\n",
" .label_from_df(cols=['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate'])\n",
" .databunch(bs=bs))\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 180,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>text</th>\n",
" <th>target</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>xxbos xxmaj take that ! \\n \\n xxup in xxup the xxup ass xxup in xxup the xxup ass xxup in xxup the xxup ass xxup in xxup the xxup ass xxup in xxup the xxup ass xxup in xxup the xxup ass xxup in xxup the xxup ass xxup in xxup the xxup ass xxup in xxup the xxup ass xxup in xxup the xxup ass xxup in</td>\n",
" <td>toxic;severe_toxic;obscene</td>\n",
" </tr>\n",
" <tr>\n",
" <td>xxbos \" \\n \\n xxmaj fourth xxmaj examination , 17th xxmaj december , 1455 . [ xxmaj additional statements . ] \\n xxmaj the sum of a thousand pounds , or crowns , was given by the xxmaj king of xxmaj england for the surrender of the xxmaj maid ; and an annuity of 300 pounds to the soldier of the xxmaj duke of xxmaj burgundy who had</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <td>xxbos \" \\n \\n xxmaj sitush you are a menace to xxmaj wikipedia as this conversation from the page of another editor shows - xxmaj are you a 15 year old kid trying his hand in editing academic articles ? ? ? \\n \\n xxmaj first xxmaj xxunk xxmaj shastri writes to xxmaj jonathan . xxmaj this is because xxmaj sitush has just vandalized his article about xxmaj</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <td>xxbos xxmaj after all the times you have thwarted me ... xxmaj after all the times my plans for world domination were foiled by your xxunk interference ... xxmaj after all the countless times you escaped at the very last moment , finally , i , xxmaj xxunk xxmaj the xxmaj mighty , have defeated you , xxmaj crum375 , xxmaj space xxmaj commander xxmaj from xxmaj swalwell ! \\n</td>\n",
" <td>toxic</td>\n",
" </tr>\n",
" <tr>\n",
" <td>xxbos \" \\n \\n ( r to my favorite editor , started before xxmaj spotfixer commented : ) \\n xxmaj not months , years . xxmaj he 'd been xxunk that shit on xxmaj shapiro since 2006 . xxmaj anyway , thanks for the kind words - coming from one in your time and place it 's especially touching that you 'd be aggrieved over my two week</td>\n",
" <td></td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_clas.show_batch()"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"We save the data object because processing the dataset each time requires significant time."
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"data_clas.save('classifier_train_data')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"data_clas = load_data(PATH, 'classifier_train_data', bs=bs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing Model"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Reclaiming up some memory\n",
"learn = None\n",
"gc.collect()\n",
"\n",
"learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)\n",
"learn.load_encoder('../Data_Processed/Data_Classifier/models/fine_tuned_enc')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def roc_score(model=learn):\n",
" \"\"\"\n",
" Calculates the ROC score for all classes separately and returns their mean.\n",
" \"\"\"\n",
" y_preds, y_true = learn.get_preds()\n",
" return roc_auc_score(y_true.numpy(), y_preds.numpy(), average='macro')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This metric will classify as correct any prediction above `thresh = 0.4`.<br>\n",
"We will then simply calculate the accuracy taking into account each label as in the normal accuracy measure"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"acc = partial(accuracy_thresh, thresh=0.4)\n",
"learn.metrics.append(acc)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def find_lr(model=learn):\n",
" \"\"\"\n",
" Convenience function to perform learning rate range test and plot results.\n",
" \"\"\"\n",
" model.lr_find()\n",
" model.recorder.plot(skip_end=15)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Start Training"
]
},
{
"cell_type": "code",
"execution_count": 201,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"find_lr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Although we can see that the loss is increasing as we go above a learning rate of, `1e-4` we will choose a learning rate of `1e-3` since that generally seems to work best when initially training a random set of weights at the head of a pretrained model."
]
},
{
"cell_type": "code",
"execution_count": 202,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>0.073462</td>\n",
" <td>0.065989</td>\n",
" <td>04:13</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(1, 1e-3, moms=(0.8, 0.7))"
]
},
{
"cell_type": "code",
"execution_count": 203,
"metadata": {},
"outputs": [],
"source": [
"learn.save('first')"
]
},
{
"cell_type": "code",
"execution_count": 204,
"metadata": {
"collapsed": true
},
"outputs": [
{
"data": {
"text/plain": [
"RNNLearner(data=TextClasDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: TextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: TextList\n",
"xxbos \" \n",
" \n",
" xxmaj sockpuppetry case \n",
" \n",
" xxmaj you have been accused of sockpuppetry . xxmaj please refer to xxmaj wikipedia : xxmaj sockpuppet investigations / xxmaj xxunk for evidence . xxmaj please make sure you make yourself familiar with notes for the suspect before editing the evidence page . 77 \",xxbos xxmaj welcome ! \n",
" \n",
" xxmaj hello , xxmaj xxunk , and welcome to xxmaj wikipedia ! xxmaj thank you for your contributions . i hope you like the place and decide to stay . xxmaj here are a few good links for newcomers : \n",
" xxmaj the five pillars of xxmaj wikipedia \n",
" xxmaj how to edit a page \n",
" xxmaj help pages \n",
" xxmaj tutorial \n",
" xxmaj how to write a great article \n",
" xxmaj manual of xxmaj style \n",
" i hope you enjoy editing here and being a xxmaj wikipedian ! xxmaj please sign your name on talk pages using four tildes ( xxrep 4 ~ ) ; this will automatically produce your name and the date . xxmaj if you need help , check out xxmaj wikipedia : xxmaj where to ask a question , ask me on my talk page , or place { { helpme } } on your talk page and someone will show up shortly to answer your questions . xxmaj again , welcome !,xxbos \" \n",
" \n",
" xxmaj agree with xxmaj user : xxmaj xxunk : \" \" this is not an article about racial or ethnic purity but a list of xxmaj americans that have roots in xxmaj estonia . \" \" \",xxbos xxmaj taking xxmaj xxunk as an example , in the xxmaj discography section the album information for each of their 3 albums is pretty much the same thing that can be found on each individual album 's page . xxmaj the albums are already linked to . xxmaj so what i am suggesting is removing what is duplicative and migrating anything that does not appear for each individual album onto that respective album 's article page . xxmaj in the end , you would be left with a xxmaj discography section listing all 3 albums . \n",
" \n",
" i also noticed the xxmaj notes on selected pieces section ; this section be taken out of the xxmaj xxunk article and appear in the pages for all 3 albums containing only those notes relevant for that particular album . xxmaj as it appears right now , the reader can not tell where a song appears unless he / she scrolls up and searches . xxmaj it would be more relevant to have the specific information pertaining to songs appearing on particular albums on the respective album 's pages .,xxbos xxmaj if someone wants to boil that plot down , go for it .\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): MultiBatchEncoder(\n",
" (module): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" )\n",
" (1): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[RNNTrainer\n",
"learn: RNNLearner(data=TextClasDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: TextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: TextList\n",
"xxbos \" \n",
" \n",
" xxmaj sockpuppetry case \n",
" \n",
" xxmaj you have been accused of sockpuppetry . xxmaj please refer to xxmaj wikipedia : xxmaj sockpuppet investigations / xxmaj xxunk for evidence . xxmaj please make sure you make yourself familiar with notes for the suspect before editing the evidence page . 77 \",xxbos xxmaj welcome ! \n",
" \n",
" xxmaj hello , xxmaj xxunk , and welcome to xxmaj wikipedia ! xxmaj thank you for your contributions . i hope you like the place and decide to stay . xxmaj here are a few good links for newcomers : \n",
" xxmaj the five pillars of xxmaj wikipedia \n",
" xxmaj how to edit a page \n",
" xxmaj help pages \n",
" xxmaj tutorial \n",
" xxmaj how to write a great article \n",
" xxmaj manual of xxmaj style \n",
" i hope you enjoy editing here and being a xxmaj wikipedian ! xxmaj please sign your name on talk pages using four tildes ( xxrep 4 ~ ) ; this will automatically produce your name and the date . xxmaj if you need help , check out xxmaj wikipedia : xxmaj where to ask a question , ask me on my talk page , or place { { helpme } } on your talk page and someone will show up shortly to answer your questions . xxmaj again , welcome !,xxbos \" \n",
" \n",
" xxmaj agree with xxmaj user : xxmaj xxunk : \" \" this is not an article about racial or ethnic purity but a list of xxmaj americans that have roots in xxmaj estonia . \" \" \",xxbos xxmaj taking xxmaj xxunk as an example , in the xxmaj discography section the album information for each of their 3 albums is pretty much the same thing that can be found on each individual album 's page . xxmaj the albums are already linked to . xxmaj so what i am suggesting is removing what is duplicative and migrating anything that does not appear for each individual album onto that respective album 's article page . xxmaj in the end , you would be left with a xxmaj discography section listing all 3 albums . \n",
" \n",
" i also noticed the xxmaj notes on selected pieces section ; this section be taken out of the xxmaj xxunk article and appear in the pages for all 3 albums containing only those notes relevant for that particular album . xxmaj as it appears right now , the reader can not tell where a song appears unless he / she scrolls up and searches . xxmaj it would be more relevant to have the specific information pertaining to songs appearing on particular albums on the respective album 's pages .,xxbos xxmaj if someone wants to boil that plot down , go for it .\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): MultiBatchEncoder(\n",
" (module): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" )\n",
" (1): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[...], layer_groups=[Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
")], add_time=True, silent=None)\n",
"alpha: 2.0\n",
"beta: 1.0], layer_groups=[Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
")], add_time=True, silent=None)"
]
},
"execution_count": 204,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.load('first')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*We will now unfreeze and fine tune the last 2 layers instead of just the last layer.*"
]
},
{
"cell_type": "code",
"execution_count": 205,
"metadata": {},
"outputs": [],
"source": [
"learn.freeze_to(-2)"
]
},
{
"cell_type": "code",
"execution_count": 206,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>0.072802</td>\n",
" <td>0.061296</td>\n",
" <td>05:32</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(1, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7))"
]
},
{
"cell_type": "code",
"execution_count": 236,
"metadata": {},
"outputs": [],
"source": [
"learn.save('second')"
]
},
{
"cell_type": "code",
"execution_count": 237,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RNNLearner(data=TextClasDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: TextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: TextList\n",
"xxbos \" \n",
" \n",
" xxmaj sockpuppetry case \n",
" \n",
" xxmaj you have been accused of sockpuppetry . xxmaj please refer to xxmaj wikipedia : xxmaj sockpuppet investigations / xxmaj xxunk for evidence . xxmaj please make sure you make yourself familiar with notes for the suspect before editing the evidence page . 77 \",xxbos xxmaj welcome ! \n",
" \n",
" xxmaj hello , xxmaj xxunk , and welcome to xxmaj wikipedia ! xxmaj thank you for your contributions . i hope you like the place and decide to stay . xxmaj here are a few good links for newcomers : \n",
" xxmaj the five pillars of xxmaj wikipedia \n",
" xxmaj how to edit a page \n",
" xxmaj help pages \n",
" xxmaj tutorial \n",
" xxmaj how to write a great article \n",
" xxmaj manual of xxmaj style \n",
" i hope you enjoy editing here and being a xxmaj wikipedian ! xxmaj please sign your name on talk pages using four tildes ( xxrep 4 ~ ) ; this will automatically produce your name and the date . xxmaj if you need help , check out xxmaj wikipedia : xxmaj where to ask a question , ask me on my talk page , or place { { helpme } } on your talk page and someone will show up shortly to answer your questions . xxmaj again , welcome !,xxbos \" \n",
" \n",
" xxmaj agree with xxmaj user : xxmaj xxunk : \" \" this is not an article about racial or ethnic purity but a list of xxmaj americans that have roots in xxmaj estonia . \" \" \",xxbos xxmaj taking xxmaj xxunk as an example , in the xxmaj discography section the album information for each of their 3 albums is pretty much the same thing that can be found on each individual album 's page . xxmaj the albums are already linked to . xxmaj so what i am suggesting is removing what is duplicative and migrating anything that does not appear for each individual album onto that respective album 's article page . xxmaj in the end , you would be left with a xxmaj discography section listing all 3 albums . \n",
" \n",
" i also noticed the xxmaj notes on selected pieces section ; this section be taken out of the xxmaj xxunk article and appear in the pages for all 3 albums containing only those notes relevant for that particular album . xxmaj as it appears right now , the reader can not tell where a song appears unless he / she scrolls up and searches . xxmaj it would be more relevant to have the specific information pertaining to songs appearing on particular albums on the respective album 's pages .,xxbos xxmaj if someone wants to boil that plot down , go for it .\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): MultiBatchEncoder(\n",
" (module): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" )\n",
" (1): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[functools.partial(<function accuracy_thresh at 0x7faa93d3c268>, thresh=0.4)], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[RNNTrainer\n",
"learn: RNNLearner(data=TextClasDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: TextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: TextList\n",
"xxbos \" \n",
" \n",
" xxmaj sockpuppetry case \n",
" \n",
" xxmaj you have been accused of sockpuppetry . xxmaj please refer to xxmaj wikipedia : xxmaj sockpuppet investigations / xxmaj xxunk for evidence . xxmaj please make sure you make yourself familiar with notes for the suspect before editing the evidence page . 77 \",xxbos xxmaj welcome ! \n",
" \n",
" xxmaj hello , xxmaj xxunk , and welcome to xxmaj wikipedia ! xxmaj thank you for your contributions . i hope you like the place and decide to stay . xxmaj here are a few good links for newcomers : \n",
" xxmaj the five pillars of xxmaj wikipedia \n",
" xxmaj how to edit a page \n",
" xxmaj help pages \n",
" xxmaj tutorial \n",
" xxmaj how to write a great article \n",
" xxmaj manual of xxmaj style \n",
" i hope you enjoy editing here and being a xxmaj wikipedian ! xxmaj please sign your name on talk pages using four tildes ( xxrep 4 ~ ) ; this will automatically produce your name and the date . xxmaj if you need help , check out xxmaj wikipedia : xxmaj where to ask a question , ask me on my talk page , or place { { helpme } } on your talk page and someone will show up shortly to answer your questions . xxmaj again , welcome !,xxbos \" \n",
" \n",
" xxmaj agree with xxmaj user : xxmaj xxunk : \" \" this is not an article about racial or ethnic purity but a list of xxmaj americans that have roots in xxmaj estonia . \" \" \",xxbos xxmaj taking xxmaj xxunk as an example , in the xxmaj discography section the album information for each of their 3 albums is pretty much the same thing that can be found on each individual album 's page . xxmaj the albums are already linked to . xxmaj so what i am suggesting is removing what is duplicative and migrating anything that does not appear for each individual album onto that respective album 's article page . xxmaj in the end , you would be left with a xxmaj discography section listing all 3 albums . \n",
" \n",
" i also noticed the xxmaj notes on selected pieces section ; this section be taken out of the xxmaj xxunk article and appear in the pages for all 3 albums containing only those notes relevant for that particular album . xxmaj as it appears right now , the reader can not tell where a song appears unless he / she scrolls up and searches . xxmaj it would be more relevant to have the specific information pertaining to songs appearing on particular albums on the respective album 's pages .,xxbos xxmaj if someone wants to boil that plot down , go for it .\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): MultiBatchEncoder(\n",
" (module): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" )\n",
" (1): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[functools.partial(<function accuracy_thresh at 0x7faa93d3c268>, thresh=0.4)], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[...], layer_groups=[Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
")], add_time=True, silent=None)\n",
"alpha: 2.0\n",
"beta: 1.0], layer_groups=[Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
")], add_time=True, silent=None)"
]
},
"execution_count": 237,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.load('second')"
]
},
{
"cell_type": "code",
"execution_count": 233,
"metadata": {},
"outputs": [],
"source": [
"learn.validate()"
]
},
{
"cell_type": "code",
"execution_count": 238,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy_thresh</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>0.053653</td>\n",
" <td>0.054283</td>\n",
" <td>0.978756</td>\n",
" <td>09:50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.freeze_to(-3)\n",
"learn.fit_one_cycle(1, slice(5e-4/(2.6**4),5e-4), moms=(0.8,0.7))"
]
},
{
"cell_type": "code",
"execution_count": 239,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RNNLearner(data=TextClasDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: TextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: TextList\n",
"xxbos \" \n",
" \n",
" xxmaj sockpuppetry case \n",
" \n",
" xxmaj you have been accused of sockpuppetry . xxmaj please refer to xxmaj wikipedia : xxmaj sockpuppet investigations / xxmaj xxunk for evidence . xxmaj please make sure you make yourself familiar with notes for the suspect before editing the evidence page . 77 \",xxbos xxmaj welcome ! \n",
" \n",
" xxmaj hello , xxmaj xxunk , and welcome to xxmaj wikipedia ! xxmaj thank you for your contributions . i hope you like the place and decide to stay . xxmaj here are a few good links for newcomers : \n",
" xxmaj the five pillars of xxmaj wikipedia \n",
" xxmaj how to edit a page \n",
" xxmaj help pages \n",
" xxmaj tutorial \n",
" xxmaj how to write a great article \n",
" xxmaj manual of xxmaj style \n",
" i hope you enjoy editing here and being a xxmaj wikipedian ! xxmaj please sign your name on talk pages using four tildes ( xxrep 4 ~ ) ; this will automatically produce your name and the date . xxmaj if you need help , check out xxmaj wikipedia : xxmaj where to ask a question , ask me on my talk page , or place { { helpme } } on your talk page and someone will show up shortly to answer your questions . xxmaj again , welcome !,xxbos \" \n",
" \n",
" xxmaj agree with xxmaj user : xxmaj xxunk : \" \" this is not an article about racial or ethnic purity but a list of xxmaj americans that have roots in xxmaj estonia . \" \" \",xxbos xxmaj taking xxmaj xxunk as an example , in the xxmaj discography section the album information for each of their 3 albums is pretty much the same thing that can be found on each individual album 's page . xxmaj the albums are already linked to . xxmaj so what i am suggesting is removing what is duplicative and migrating anything that does not appear for each individual album onto that respective album 's article page . xxmaj in the end , you would be left with a xxmaj discography section listing all 3 albums . \n",
" \n",
" i also noticed the xxmaj notes on selected pieces section ; this section be taken out of the xxmaj xxunk article and appear in the pages for all 3 albums containing only those notes relevant for that particular album . xxmaj as it appears right now , the reader can not tell where a song appears unless he / she scrolls up and searches . xxmaj it would be more relevant to have the specific information pertaining to songs appearing on particular albums on the respective album 's pages .,xxbos xxmaj if someone wants to boil that plot down , go for it .\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): MultiBatchEncoder(\n",
" (module): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" )\n",
" (1): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[functools.partial(<function accuracy_thresh at 0x7faa93d3c268>, thresh=0.4)], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[RNNTrainer\n",
"learn: RNNLearner(data=TextClasDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: TextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: TextList\n",
"xxbos \" \n",
" \n",
" xxmaj sockpuppetry case \n",
" \n",
" xxmaj you have been accused of sockpuppetry . xxmaj please refer to xxmaj wikipedia : xxmaj sockpuppet investigations / xxmaj xxunk for evidence . xxmaj please make sure you make yourself familiar with notes for the suspect before editing the evidence page . 77 \",xxbos xxmaj welcome ! \n",
" \n",
" xxmaj hello , xxmaj xxunk , and welcome to xxmaj wikipedia ! xxmaj thank you for your contributions . i hope you like the place and decide to stay . xxmaj here are a few good links for newcomers : \n",
" xxmaj the five pillars of xxmaj wikipedia \n",
" xxmaj how to edit a page \n",
" xxmaj help pages \n",
" xxmaj tutorial \n",
" xxmaj how to write a great article \n",
" xxmaj manual of xxmaj style \n",
" i hope you enjoy editing here and being a xxmaj wikipedian ! xxmaj please sign your name on talk pages using four tildes ( xxrep 4 ~ ) ; this will automatically produce your name and the date . xxmaj if you need help , check out xxmaj wikipedia : xxmaj where to ask a question , ask me on my talk page , or place { { helpme } } on your talk page and someone will show up shortly to answer your questions . xxmaj again , welcome !,xxbos \" \n",
" \n",
" xxmaj agree with xxmaj user : xxmaj xxunk : \" \" this is not an article about racial or ethnic purity but a list of xxmaj americans that have roots in xxmaj estonia . \" \" \",xxbos xxmaj taking xxmaj xxunk as an example , in the xxmaj discography section the album information for each of their 3 albums is pretty much the same thing that can be found on each individual album 's page . xxmaj the albums are already linked to . xxmaj so what i am suggesting is removing what is duplicative and migrating anything that does not appear for each individual album onto that respective album 's article page . xxmaj in the end , you would be left with a xxmaj discography section listing all 3 albums . \n",
" \n",
" i also noticed the xxmaj notes on selected pieces section ; this section be taken out of the xxmaj xxunk article and appear in the pages for all 3 albums containing only those notes relevant for that particular album . xxmaj as it appears right now , the reader can not tell where a song appears unless he / she scrolls up and searches . xxmaj it would be more relevant to have the specific information pertaining to songs appearing on particular albums on the respective album 's pages .,xxbos xxmaj if someone wants to boil that plot down , go for it .\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): MultiBatchEncoder(\n",
" (module): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" )\n",
" (1): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[functools.partial(<function accuracy_thresh at 0x7faa93d3c268>, thresh=0.4)], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[...], layer_groups=[Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
")], add_time=True, silent=None)\n",
"alpha: 2.0\n",
"beta: 1.0], layer_groups=[Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
")], add_time=True, silent=None)"
]
},
"execution_count": 239,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.save('third')\n",
"learn.load('third')"
]
},
{
"cell_type": "code",
"execution_count": 240,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(MultiCategory toxic;severe_toxic;obscene;insult,\n",
" tensor([1., 1., 1., 0., 1., 0.]),\n",
" tensor([1.0000, 0.9644, 0.9999, 0.2910, 0.9989, 0.2105]))"
]
},
"execution_count": 240,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.predict(\"I will kill you. You are fat. Fuck off\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will now unfreeze the model and train it for a few epochs"
]
},
{
"cell_type": "code",
"execution_count": 241,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy_thresh</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>0.055374</td>\n",
" <td>0.049385</td>\n",
" <td>0.980823</td>\n",
" <td>13:01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>0.050902</td>\n",
" <td>0.047490</td>\n",
" <td>0.981314</td>\n",
" <td>11:02</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>0.050022</td>\n",
" <td>0.046927</td>\n",
" <td>0.981711</td>\n",
" <td>10:57</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.unfreeze()\n",
"learn.fit_one_cycle(3, slice(1e-3/(2.6**4),1e-4), moms=(0.8,0.7))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.save('unfrozen_four')"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RNNLearner(data=TextClasDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: TextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: TextList\n",
"xxbos \" \n",
" \n",
" xxmaj sockpuppetry case \n",
" \n",
" xxmaj you have been accused of sockpuppetry . xxmaj please refer to xxmaj wikipedia : xxmaj sockpuppet investigations / xxmaj xxunk for evidence . xxmaj please make sure you make yourself familiar with notes for the suspect before editing the evidence page . 77 \",xxbos xxmaj welcome ! \n",
" \n",
" xxmaj hello , xxmaj xxunk , and welcome to xxmaj wikipedia ! xxmaj thank you for your contributions . i hope you like the place and decide to stay . xxmaj here are a few good links for newcomers : \n",
" xxmaj the five pillars of xxmaj wikipedia \n",
" xxmaj how to edit a page \n",
" xxmaj help pages \n",
" xxmaj tutorial \n",
" xxmaj how to write a great article \n",
" xxmaj manual of xxmaj style \n",
" i hope you enjoy editing here and being a xxmaj wikipedian ! xxmaj please sign your name on talk pages using four tildes ( xxrep 4 ~ ) ; this will automatically produce your name and the date . xxmaj if you need help , check out xxmaj wikipedia : xxmaj where to ask a question , ask me on my talk page , or place { { helpme } } on your talk page and someone will show up shortly to answer your questions . xxmaj again , welcome !,xxbos \" \n",
" \n",
" xxmaj agree with xxmaj user : xxmaj xxunk : \" \" this is not an article about racial or ethnic purity but a list of xxmaj americans that have roots in xxmaj estonia . \" \" \",xxbos xxmaj taking xxmaj xxunk as an example , in the xxmaj discography section the album information for each of their 3 albums is pretty much the same thing that can be found on each individual album 's page . xxmaj the albums are already linked to . xxmaj so what i am suggesting is removing what is duplicative and migrating anything that does not appear for each individual album onto that respective album 's article page . xxmaj in the end , you would be left with a xxmaj discography section listing all 3 albums . \n",
" \n",
" i also noticed the xxmaj notes on selected pieces section ; this section be taken out of the xxmaj xxunk article and appear in the pages for all 3 albums containing only those notes relevant for that particular album . xxmaj as it appears right now , the reader can not tell where a song appears unless he / she scrolls up and searches . xxmaj it would be more relevant to have the specific information pertaining to songs appearing on particular albums on the respective album 's pages .,xxbos xxmaj if someone wants to boil that plot down , go for it .\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): MultiBatchEncoder(\n",
" (module): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" )\n",
" (1): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[RNNTrainer\n",
"learn: RNNLearner(data=TextClasDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: TextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: TextList\n",
"xxbos \" \n",
" \n",
" xxmaj sockpuppetry case \n",
" \n",
" xxmaj you have been accused of sockpuppetry . xxmaj please refer to xxmaj wikipedia : xxmaj sockpuppet investigations / xxmaj xxunk for evidence . xxmaj please make sure you make yourself familiar with notes for the suspect before editing the evidence page . 77 \",xxbos xxmaj welcome ! \n",
" \n",
" xxmaj hello , xxmaj xxunk , and welcome to xxmaj wikipedia ! xxmaj thank you for your contributions . i hope you like the place and decide to stay . xxmaj here are a few good links for newcomers : \n",
" xxmaj the five pillars of xxmaj wikipedia \n",
" xxmaj how to edit a page \n",
" xxmaj help pages \n",
" xxmaj tutorial \n",
" xxmaj how to write a great article \n",
" xxmaj manual of xxmaj style \n",
" i hope you enjoy editing here and being a xxmaj wikipedian ! xxmaj please sign your name on talk pages using four tildes ( xxrep 4 ~ ) ; this will automatically produce your name and the date . xxmaj if you need help , check out xxmaj wikipedia : xxmaj where to ask a question , ask me on my talk page , or place { { helpme } } on your talk page and someone will show up shortly to answer your questions . xxmaj again , welcome !,xxbos \" \n",
" \n",
" xxmaj agree with xxmaj user : xxmaj xxunk : \" \" this is not an article about racial or ethnic purity but a list of xxmaj americans that have roots in xxmaj estonia . \" \" \",xxbos xxmaj taking xxmaj xxunk as an example , in the xxmaj discography section the album information for each of their 3 albums is pretty much the same thing that can be found on each individual album 's page . xxmaj the albums are already linked to . xxmaj so what i am suggesting is removing what is duplicative and migrating anything that does not appear for each individual album onto that respective album 's article page . xxmaj in the end , you would be left with a xxmaj discography section listing all 3 albums . \n",
" \n",
" i also noticed the xxmaj notes on selected pieces section ; this section be taken out of the xxmaj xxunk article and appear in the pages for all 3 albums containing only those notes relevant for that particular album . xxmaj as it appears right now , the reader can not tell where a song appears unless he / she scrolls up and searches . xxmaj it would be more relevant to have the specific information pertaining to songs appearing on particular albums on the respective album 's pages .,xxbos xxmaj if someone wants to boil that plot down , go for it .\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): MultiBatchEncoder(\n",
" (module): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" )\n",
" (1): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[...], layer_groups=[Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
")], add_time=True, silent=None)\n",
"alpha: 2.0\n",
"beta: 1.0], layer_groups=[Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
")], add_time=True, silent=None)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.load('unfrozen_four')"
]
},
{
"cell_type": "code",
"execution_count": 243,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(MultiCategory toxic;severe_toxic;obscene;threat;insult,\n",
" tensor([1., 1., 1., 1., 1., 0.]),\n",
" tensor([0.9999, 0.9392, 0.9997, 0.6798, 0.9904, 0.1193]))"
]
},
"execution_count": 243,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.predict(\"I will kill you. You are fat. Fuck off\")"
]
},
{
"cell_type": "code",
"execution_count": 254,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(MultiCategory toxic;obscene;threat,\n",
" tensor([1., 0., 1., 1., 0., 0.]),\n",
" tensor([0.8747, 0.2716, 0.6994, 0.6640, 0.3426, 0.0494]))"
]
},
"execution_count": 254,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.predict(\"I am gonna kick you\")"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9851895132990093"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"roc_score()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Training a bit more"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"find_lr()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" <div>\n",
" <style>\n",
" /* Turns off some styling */\n",
" progress {\n",
" /* gets rid of default border in Firefox and Opera. */\n",
" border: none;\n",
" /* Needs to be in here for Safari polyfill so background images work as expected. */\n",
" background-size: auto;\n",
" }\n",
" .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {\n",
" background: #F44336;\n",
" }\n",
" </style>\n",
" <progress value='4' class='' max='5', style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
" 80.00% [4/5 18:46<04:41]\n",
" </div>\n",
" \n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy_thresh</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>0.053492</td>\n",
" <td>0.046831</td>\n",
" <td>0.980980</td>\n",
" <td>04:24</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>0.052653</td>\n",
" <td>0.046132</td>\n",
" <td>0.982369</td>\n",
" <td>04:00</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>0.052927</td>\n",
" <td>0.044149</td>\n",
" <td>0.982818</td>\n",
" <td>05:31</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>0.046456</td>\n",
" <td>0.044377</td>\n",
" <td>0.982432</td>\n",
" <td>04:49</td>\n",
" </tr>\n",
" </tbody>\n",
"</table><p>\n",
"\n",
" <div>\n",
" <style>\n",
" /* Turns off some styling */\n",
" progress {\n",
" /* gets rid of default border in Firefox and Opera. */\n",
" border: none;\n",
" /* Needs to be in here for Safari polyfill so background images work as expected. */\n",
" background-size: auto;\n",
" }\n",
" .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {\n",
" background: #F44336;\n",
" }\n",
" </style>\n",
" <progress value='76' class='' max='333', style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
" 22.82% [76/333 00:11<00:40]\n",
" </div>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(5, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7))"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"learn.save('unfrozen_2')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RNNLearner(data=TextClasDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: TextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: TextList\n",
"xxbos \" \n",
" \n",
" xxmaj sockpuppetry case \n",
" \n",
" xxmaj you have been accused of sockpuppetry . xxmaj please refer to xxmaj wikipedia : xxmaj sockpuppet investigations / xxmaj xxunk for evidence . xxmaj please make sure you make yourself familiar with notes for the suspect before editing the evidence page . 77 \",xxbos xxmaj welcome ! \n",
" \n",
" xxmaj hello , xxmaj xxunk , and welcome to xxmaj wikipedia ! xxmaj thank you for your contributions . i hope you like the place and decide to stay . xxmaj here are a few good links for newcomers : \n",
" xxmaj the five pillars of xxmaj wikipedia \n",
" xxmaj how to edit a page \n",
" xxmaj help pages \n",
" xxmaj tutorial \n",
" xxmaj how to write a great article \n",
" xxmaj manual of xxmaj style \n",
" i hope you enjoy editing here and being a xxmaj wikipedian ! xxmaj please sign your name on talk pages using four tildes ( xxrep 4 ~ ) ; this will automatically produce your name and the date . xxmaj if you need help , check out xxmaj wikipedia : xxmaj where to ask a question , ask me on my talk page , or place { { helpme } } on your talk page and someone will show up shortly to answer your questions . xxmaj again , welcome !,xxbos \" \n",
" \n",
" xxmaj agree with xxmaj user : xxmaj xxunk : \" \" this is not an article about racial or ethnic purity but a list of xxmaj americans that have roots in xxmaj estonia . \" \" \",xxbos xxmaj taking xxmaj xxunk as an example , in the xxmaj discography section the album information for each of their 3 albums is pretty much the same thing that can be found on each individual album 's page . xxmaj the albums are already linked to . xxmaj so what i am suggesting is removing what is duplicative and migrating anything that does not appear for each individual album onto that respective album 's article page . xxmaj in the end , you would be left with a xxmaj discography section listing all 3 albums . \n",
" \n",
" i also noticed the xxmaj notes on selected pieces section ; this section be taken out of the xxmaj xxunk article and appear in the pages for all 3 albums containing only those notes relevant for that particular album . xxmaj as it appears right now , the reader can not tell where a song appears unless he / she scrolls up and searches . xxmaj it would be more relevant to have the specific information pertaining to songs appearing on particular albums on the respective album 's pages .,xxbos xxmaj if someone wants to boil that plot down , go for it .\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): MultiBatchEncoder(\n",
" (module): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" )\n",
" (1): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[functools.partial(<function accuracy_thresh at 0x7f4d9682e1e0>, thresh=0.4)], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[RNNTrainer\n",
"learn: RNNLearner(data=TextClasDataBunch;\n",
"\n",
"Train: LabelList (143614 items)\n",
"x: TextList\n",
"xxbos xxmaj explanation \n",
" xxmaj why the edits made under my username xxmaj hardcore xxmaj metallica xxmaj fan were reverted ? xxmaj they were n't vandalisms , just closure on some gas after i voted at xxmaj new xxmaj york xxmaj dolls xxup fac . xxmaj and please do n't remove the template from the talk page since i 'm retired xxunk,xxbos xxmaj xxunk ! xxmaj he matches this background colour i 'm seemingly stuck with . xxmaj thanks . ( talk ) 21:51 , xxmaj january 11 , 2016 ( xxup utc ),xxbos xxmaj hey man , i 'm really not trying to edit war . xxmaj it 's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page . xxmaj he seems to care more about the formatting than the actual info .,xxbos \" \n",
" xxmaj more \n",
" i ca n't make any real suggestions on improvement - i wondered if the section statistics should be later on , or a subsection of \" \" types of accidents \" \" xxup -i think the references may need tidying so that they are all in the exact same format ie date format etc . i can do that later on , if no - one else does first - if you have any preferences for formatting style on references or want to do it yourself please let me know . \n",
" \n",
" xxmaj there appears to be a backlog on articles for review so i guess there may be a delay until a reviewer turns up . xxmaj it 's listed in the relevant form eg xxmaj wikipedia : xxmaj xxunk # xxmaj transport \",xxbos xxmaj you , sir , are my hero . xxmaj any chance you remember what page that 's on ?\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Valid: LabelList (15957 items)\n",
"x: TextList\n",
"xxbos \" \n",
" \n",
" xxmaj sockpuppetry case \n",
" \n",
" xxmaj you have been accused of sockpuppetry . xxmaj please refer to xxmaj wikipedia : xxmaj sockpuppet investigations / xxmaj xxunk for evidence . xxmaj please make sure you make yourself familiar with notes for the suspect before editing the evidence page . 77 \",xxbos xxmaj welcome ! \n",
" \n",
" xxmaj hello , xxmaj xxunk , and welcome to xxmaj wikipedia ! xxmaj thank you for your contributions . i hope you like the place and decide to stay . xxmaj here are a few good links for newcomers : \n",
" xxmaj the five pillars of xxmaj wikipedia \n",
" xxmaj how to edit a page \n",
" xxmaj help pages \n",
" xxmaj tutorial \n",
" xxmaj how to write a great article \n",
" xxmaj manual of xxmaj style \n",
" i hope you enjoy editing here and being a xxmaj wikipedian ! xxmaj please sign your name on talk pages using four tildes ( xxrep 4 ~ ) ; this will automatically produce your name and the date . xxmaj if you need help , check out xxmaj wikipedia : xxmaj where to ask a question , ask me on my talk page , or place { { helpme } } on your talk page and someone will show up shortly to answer your questions . xxmaj again , welcome !,xxbos \" \n",
" \n",
" xxmaj agree with xxmaj user : xxmaj xxunk : \" \" this is not an article about racial or ethnic purity but a list of xxmaj americans that have roots in xxmaj estonia . \" \" \",xxbos xxmaj taking xxmaj xxunk as an example , in the xxmaj discography section the album information for each of their 3 albums is pretty much the same thing that can be found on each individual album 's page . xxmaj the albums are already linked to . xxmaj so what i am suggesting is removing what is duplicative and migrating anything that does not appear for each individual album onto that respective album 's article page . xxmaj in the end , you would be left with a xxmaj discography section listing all 3 albums . \n",
" \n",
" i also noticed the xxmaj notes on selected pieces section ; this section be taken out of the xxmaj xxunk article and appear in the pages for all 3 albums containing only those notes relevant for that particular album . xxmaj as it appears right now , the reader can not tell where a song appears unless he / she scrolls up and searches . xxmaj it would be more relevant to have the specific information pertaining to songs appearing on particular albums on the respective album 's pages .,xxbos xxmaj if someone wants to boil that plot down , go for it .\n",
"y: MultiCategoryList\n",
",,,,\n",
"Path: /home/akash/personal_projects/kaggle/ToxicComments/data;\n",
"\n",
"Test: None, model=SequentialRNN(\n",
" (0): MultiBatchEncoder(\n",
" (module): AWD_LSTM(\n",
" (encoder): Embedding(60004, 400, padding_idx=1)\n",
" (encoder_dp): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
" (rnns): ModuleList(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (2): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" )\n",
" (input_dp): RNNDropout()\n",
" (hidden_dps): ModuleList(\n",
" (0): RNNDropout()\n",
" (1): RNNDropout()\n",
" (2): RNNDropout()\n",
" )\n",
" )\n",
" )\n",
" (1): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
"), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[functools.partial(<function accuracy_thresh at 0x7f4d9682e1e0>, thresh=0.4)], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('/home/akash/personal_projects/kaggle/ToxicComments/data'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[...], layer_groups=[Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
")], add_time=True, silent=None)\n",
"alpha: 2.0\n",
"beta: 1.0], layer_groups=[Sequential(\n",
" (0): Embedding(60004, 400, padding_idx=1)\n",
" (1): EmbeddingDropout(\n",
" (emb): Embedding(60004, 400, padding_idx=1)\n",
" )\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(400, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 1150, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): WeightDropout(\n",
" (module): LSTM(1150, 400, batch_first=True)\n",
" )\n",
" (1): RNNDropout()\n",
"), Sequential(\n",
" (0): PoolingLinearClassifier(\n",
" (layers): Sequential(\n",
" (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (1): Dropout(p=0.2)\n",
" (2): Linear(in_features=1200, out_features=50, bias=True)\n",
" (3): ReLU(inplace)\n",
" (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (5): Dropout(p=0.1)\n",
" (6): Linear(in_features=50, out_features=6, bias=True)\n",
" )\n",
" )\n",
")], add_time=True, silent=None)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.load('unfrozen_2')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evaluating The Model"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>text</th>\n",
" <th>target</th>\n",
" <th>prediction</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>xxbos xxup suck xxup my xxup cock d xxup suck xxup my xxup cock d xxup suck xxup my xxup cock d xxup suck xxup my xxup cock d xxup suck xxup my xxup cock d xxup suck xxup my xxup cock d xxup suck xxup my xxup cock d xxup suck xxup my xxup cock d xxup suck xxup my xxup cock d xxup suck xxup my xxup cock</td>\n",
" <td>toxic;severe_toxic;obscene;insult</td>\n",
" <td>toxic;severe_toxic;obscene;insult</td>\n",
" </tr>\n",
" <tr>\n",
" <td>xxbos xxup damn xxup you u xxup cunt xxup damn xxup you u xxup cunt xxup damn xxup you u xxup cunt xxup damn xxup you u xxup cunt xxup damn xxup you u xxup cunt xxup damn xxup you u xxup cunt xxup damn xxup you u xxup cunt xxup damn xxup you u xxup cunt xxup damn xxup you u xxup cunt xxup damn xxup you u xxup</td>\n",
" <td>toxic;severe_toxic;obscene;insult</td>\n",
" <td>toxic;severe_toxic;obscene;insult;identity_hate</td>\n",
" </tr>\n",
" <tr>\n",
" <td>xxbos do go fuck off bastard \\n xxmaj do xxmaj yyou xxmaj have a life ? \\n go fuck off bastard and yank your cock through your ass . i hate you and hope you go away forever . lame is you fuck your mom . die die die and all that crap . this is for xxunk xxunk \\n ass . i ass . i ass</td>\n",
" <td>toxic;severe_toxic;obscene;threat;insult</td>\n",
" <td>toxic;severe_toxic;obscene;threat;insult</td>\n",
" </tr>\n",
" <tr>\n",
" <td>xxbos xxup shut xxup the xxup fuck xxup up ! \\n xxup shut xxup the xxup fuck xxup up ! \\n xxup shut xxup the xxup fuck xxup up ! \\n xxup shut xxup the xxup fuck xxup up ! \\n xxup shut xxup the xxup fuck xxup up ! \\n xxup shut xxup the xxup fuck xxup up ! \\n xxup shut xxup</td>\n",
" <td>toxic;severe_toxic;obscene</td>\n",
" <td>toxic;severe_toxic;obscene;insult</td>\n",
" </tr>\n",
" <tr>\n",
" <td>xxbos xxup wikipedia xxup loves xxup me . xxup they xxup like xxup to xxup censor xxup me . xxup communism xxup is xxup censorship . xxup wikipedia xxup is xxup communism . \\n xxup wikipedia xxup loves xxup me . xxup they xxup like xxup to xxup censor xxup me . xxup communism xxup is xxup censorship . xxup wikipedia xxup is xxup communism . \\n xxup</td>\n",
" <td>toxic;severe_toxic</td>\n",
" <td>toxic</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.show_results()"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9871003646219655"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"roc_score()"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.0441175, tensor(0.9826)]"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.validate()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment