Skip to content

Instantly share code, notes, and snippets.

@kevinbird15
Last active April 30, 2018 05:13
Show Gist options
  • Save kevinbird15/e4bd985620741c6e55ed621fd8655a3a to your computer and use it in GitHub Desktop.
Save kevinbird15/e4bd985620741c6e55ed621fd8655a3a to your computer and use it in GitHub Desktop.
Counting Example for Language Model
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Building a Language Model\n## Using a Simple Idea to Generate Valuable Insights\nAuthored By: [Kevin Bird](https://twitter.com/kjbird15) "
},
{
"metadata": {},
"cell_type": "markdown",
"source": "After many classes of building language models, I decided it was time to understand exactly what was going on underneath the hood. The problem I was having with the language models we had already created is that they had thousands of words. This meant that in order to train them properly, the model needed hundreds of columns to describe the words and hundreds of hidden activations for the model to optimize. My goal was to create a simple model that would be predictive using the same model structure, but on a much less massive scale. I would highly recommend following along and trying to build this at some point and see how it goes!"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import fastai\nfrom fastai import learner\nfrom fastai import dataset\nfrom fastai import model\nfrom pathlib import Path\nfrom fastai.text import *\n\nimport pandas as pd\nimport numpy as np\nimport spacy\nimport json\nimport re\nimport html",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Creating the Data\nMy first roadblock that I ran into was how to get data that would be useful for such a task.\n\nUsually I see people use completely random data when they don't have a dataset to show a concept. Instead, I'm going to use a counting dataset that starts at a random number and then counts up 10, wrapping around from \"nine\" to \"zero\". \n\nThis gave me data that should be quite easy to predict since it was fairly predictable. The only time they wouldn’t follow the rule of increment by one is between the end of one row and the beginning of another row."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "numbers = [\"zero\", \"one\", \"two\", \"three\", \"four\", \"five\", \"six\", \"seven\", \"eight\", \"nine\"]\n\ndef DataGenerator():\n numlist = \"\"\n starting_num = random.randint(0,9) #pick a random number from 0-9\n for i in range(10):\n if i==0:\n numlist = str(numbers[(starting_num+i)%10])\n else:\n numlist = numlist + \" \" + str(numbers[(starting_num+i)%10])\n return numlist",
"execution_count": 2,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Time to test that my function actually works..."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "DataGenerator()",
"execution_count": 5,
"outputs": [
{
"data": {
"text/plain": "'eight nine zero one two three four five six seven'"
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Perfect, we are now counting 10 numbers and printing them out. Now it's time to put this into a loop and generate a decent amount of data. The more data we have, the better the predictions will be, but also the more resources (time and RAM) our model will require. "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "myData = pd.DataFrame()\nfor i in range(1000):\n myData = myData.append(pd.Series(DataGenerator()), ignore_index=True)",
"execution_count": 6,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "After generating the data, I needed to tokenize each of the number strings. To do this, I used the Fast.ai Tokenizer class."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "tok = Tokenizer()",
"execution_count": 7,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "This tokenizer utilizes multiple CPU cores by first dividing the text between however many cores the machine has using [partition_by_cores](https://github.com/fastai/fastai/blob/master/fastai/core.py#L88:5). This gives a list of numpy arrays with each numpy array containing `number of lines of data/number of cores`. After this, the list is passed into the proc_all_mp function which breaks them into small chunks for each processor to tokenize."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "texts = myData[0].astype(str)\n\ntexts.values[0:10]\n\ndata = tok.proc_all_mp(partition_by_cores(texts.values.astype(str)))",
"execution_count": 8,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Now that we have the data, let's build a frequency map. This will tell us how many times each word was seen. Since we have 10 numbers (0-9) and are doing 1000 sequences, all of these will be exactly 1000. Usually this will not be the case and this part will help filter out any words that are only seen a low number of time. "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "freq = Counter(p for o in data for p in o)",
"execution_count": 11,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "itos will be used to translate the number back into a string. In a regular language model, this is where you would determine the maximum number of words you want to have (max dictionary size) and minimum number of times a word has to be used to be put into the dictionary (minimum frequency). In this example, I used 10 as the max dictionary size and 2 as the minimum frequency. \n\nWe are also inserting an \\_eos_ token to signify the end of the string."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "itos = [o for o,c in freq.most_common(10) if c > 2]\nfor i in [\"_eos_\", \"_pad_\", \"_unk_\"]:\n itos.insert(0, i)",
"execution_count": 12,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "So now our itos variable has 13 records after the addition of an unknown variable, a padding variable, and an end of sentence variable. "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "itos",
"execution_count": 13,
"outputs": [
{
"data": {
"text/plain": "['_unk_',\n '_pad_',\n '_eos_',\n 'zero',\n 'one',\n 'two',\n 'three',\n 'four',\n 'five',\n 'six',\n 'seven',\n 'eight',\n 'nine']"
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "freq.most_common(10)",
"execution_count": 14,
"outputs": [
{
"data": {
"text/plain": "[('zero', 1000),\n ('one', 1000),\n ('two', 1000),\n ('three', 1000),\n ('four', 1000),\n ('five', 1000),\n ('six', 1000),\n ('seven', 1000),\n ('eight', 1000),\n ('nine', 1000)]"
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "itos should then be iterated through using a defaultdict that has a dictionary of the tokens as the key and the index as the value. This gives us the stoi and will be used to translate each of the words into an int that the language model can consume. "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "stoi = collections.defaultdict(lambda:0, {v:k for k,v in enumerate(itos)})\nlen(itos)",
"execution_count": 16,
"outputs": [
{
"data": {
"text/plain": "13"
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "The lambda:0 is telling this that if you don't know what the word is, give it a value of \"0\" which we know is tied to '_unk_' so translating it back, would replace that word with '_unk_'"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "unknownNumber = stoi[\"ten\"];print(\"unknownNumber idx: \" + str(unknownNumber))\nknownNumber = stoi[\"nine\"];print(\"knownNumber idx: \" + str(knownNumber))\nprint(itos[unknownNumber])\nprint(itos[knownNumber])",
"execution_count": 17,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "unknownNumber idx: 0\nknownNumber idx: 12\n_unk_\nnine\n"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "All I'm doing here is feeding each of my numbers through and turning the string into an int using stoi[wordtotokenize]"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "tokenized_data = [[stoi[o] for o in i] for i in data]",
"execution_count": 18,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "I went from an array with strings of numbers to a list with arrays of token index’s. These token index’s are what the model will use to train the activations. At this point, I added token index 2 to the end of every string representation. This will help the model train by keeping some more of the metadata in tact."
},
{
"metadata": {
"scrolled": true,
"trusted": true
},
"cell_type": "code",
"source": "for datanew in tokenized_data:\n datanew+=[2]",
"execution_count": 19,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Your values will be different, but will look similar to the screenshot below\n![image.png](attachment:image.png)",
"attachments": {
"image.png": {
"image/png": ""
}
}
},
{
"metadata": {},
"cell_type": "markdown",
"source": "At this point, the data is clean and we are ready to select our model architecture. There are a lot of hyper-parameters that can be modified. I will discuss the ones that I used. It is highly recommended to not read the rest of this section unless you are following along in the code. At this point, I would recommend going back and actually implementing the data piece yourself.\n\nOne last thing we will do is combine all of the tokenized lists together into one long string of numbers. This is what the model will expect the data to look like. "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "combined_tokenized_data = np.concatenate(tokenized_data)",
"execution_count": 24,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "PATH = Path(\"data/counterExample/\")",
"execution_count": 22,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "##### LanguageModelLoader\nFor this model, I used the LanguageModelLoader class from FastAI’s text.py library. I fed in a numpy array of the tokenized data, a batch size of 4, and a Back Prop Through Time (BPTT) of 12. The batch size of 4 was chosen after a few different steps of trial and error. This is definitely dependent on the size of data that is being used. Make this as high as possible and then tune it down if your model is having troubles. The higher the batch size, the faster the model will train. BPTT should be set as high as possible before running out of GPU memory. BPTT can also be modified if training is unstable or if a model is taking longer to train than wanted."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "bptt=12\nbs=4\ndataloader = LanguageModelLoader(combined_tokenized_data, bs, bptt)",
"execution_count": 25,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "##### LanguageModelData\nAfter this, I use the LanguageModelData class also from FastAI’s text.py library. This requires a PATH which I put into Path(“data/counterExample/”) using the pathlib library. Pad Index was set to 1. This was determined earlier by inserting a _pad_ string into itos’ 1 index. Next is the vocab size which in my case is 13 (_unk_, _pad_, _eos_, “zero”, “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, and “nine”). Next, the training data loader is inserted. After this, the validation data loader is inserted. In my case, I used the same data loader to fulfill both of these, but in a real problem, this is definitely something that should be separate since this is how you can judge the effectiveness of a model and ensure that it isn’t over-fitting."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "#PATH\n#Pad_Idx\n#Number of tokens\n#dataloader - Training\n#dataloader - Validation (Should be different from Training)\nmodeldata = LanguageModelData(PATH,1,len(itos), dataloader, dataloader)",
"execution_count": 26,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "##### Dropout\nThe model dropout is the next thing to select. In language models, there are 5 separate dropouts. Input dropout (dropouti), linear model dropout (dropout), dropout used for an LSTM’s internal recurrent weights (wdrop), embedding layer dropout (dropoute), and hidden layer dropout (dropouth). A good ratio for these seems to be [0.25,0.1,0.2,0.02,0.15]*DropoutWeight but this is something that can be experimented with. A general rule of thumb is that the more data that you have, the lower you can put your DropoutWeight. This is because it will be harder for the model to over-fit on specific patterns if there is more data so less hiding of the data is needed to reduce over-fitting.\n\n##### get_model\nNow call the get_model function using the LanguageModelData object created above. The optimizer function can be whatever you choose, I used optim.Adam, but try out other optimizers as well and see what works best. Some other optimizers to try are ASGD, SGD, and Adamax. Embedding size should be chosen based off the size of your dictionary. This should be big enough to extract meaning clusters from the words. I would recommend playing around with this a bit, but the smaller the dictionary you are working with, the closer your embedding size will be to the vocab size, but as the vocab size increases, the embedding size can have a smaller embedding/vocab word count ratio. Next, choose the number of hidden activations. I would recommend starting at 2x the embedding size and then try increasing and decreasing it to see if it makes a difference. I started with my hidden activations much, much larger than my embedding size and was getting a very poorly trained model. I believe this could have worked, but I would have needed to generate a lot larger amount of data and really it didn’t need to be as complex as it was. Use the simplest model possible that still predicts how you want it to. The last parameter to choose is the number of LSTMs to use in the model. For this, I would start with 3 layers and modify as needed. I was able to use just two LSTMs and get predictive results."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "drops=np.array([0.25, 0.1, 0.2, 0.02, 0.15])*0.7\n\nem_sz,nh,nl = 8,16,2\nopt_fn = optim.Adam\n\nlearner = modeldata.get_model(opt_fn, em_sz, nh, nl,dropouti=drops[0], dropout=drops[1], wdrop=drops[2], dropoute=drops[3], dropouth=drops[4])\nlearner.metrics = [accuracy]\nlearner.unfreeze()",
"execution_count": 27,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "At this point, we have an model that has random weights that are ready to be trained on our data!"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "learner",
"execution_count": 28,
"outputs": [
{
"data": {
"text/plain": "SequentialRNN(\n (0): RNN_Encoder(\n (encoder): Embedding(13, 8, padding_idx=1)\n (encoder_with_dropout): EmbeddingDropout(\n (embed): Embedding(13, 8, padding_idx=1)\n )\n (rnns): ModuleList(\n (0): WeightDrop(\n (module): LSTM(8, 16, dropout=0.105)\n )\n (1): WeightDrop(\n (module): LSTM(16, 8, dropout=0.105)\n )\n )\n (dropouti): LockedDropout(\n )\n (dropouths): ModuleList(\n (0): LockedDropout(\n )\n (1): LockedDropout(\n )\n )\n )\n (1): LinearDecoder(\n (decoder): Linear(in_features=8, out_features=13)\n (dropout): LockedDropout(\n )\n )\n)"
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "##### lr_find()\nNext, we find a learning rate that will fit our model using the learner object created with the get_model function. FastAI has a lr_find() function that sweeps a wide range of learning rates which can then be plotted. The key to choosing a learning rate is to choose a learning rate where the loss is still going down, not the absolute bottom point. for this model, I chose 10^-2. I probably could have went up to 10^-1, but I knew the model would train pretty quickly so I went on the safe side. This is another parameter that should be as high as possible without reducing the accuracy of the model."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "learner.lr_find()\nlearner.sched.plot()",
"execution_count": 29,
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "db8411e0e2f8427cb08a2290774dc22d",
"version_major": 2,
"version_minor": 0
},
"text/html": "<p>Failed to display Jupyter Widget of type <code>HBox</code>.</p>\n<p>\n If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n that the widgets JavaScript is still loading. If this message persists, it\n likely means that the widgets JavaScript library is either not installed or\n not enabled. See the <a href=\"https://ipywidgets.readthedocs.io/en/stable/user_install.html\">Jupyter\n Widgets Documentation</a> for setup instructions.\n</p>\n<p>\n If you're reading this message in another frontend (for example, a static\n rendering on GitHub or <a href=\"https://nbviewer.jupyter.org/\">NBViewer</a>),\n it may mean that your frontend doesn't currently support widgets.\n</p>\n",
"text/plain": "HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))"
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": "epoch trn_loss val_loss accuracy \n 0 2.477381 2.56495 0.010456 \n\n"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "lr = 10e-2",
"execution_count": 31,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Training the Model\nNow that we have a model, it’s time to train it on the data generated above. There are a lot of very interesting parameters to finetune when training the model. The FastAI library allows variable learning rates, weight decays, and many other parameters when fitting the model. For this model, we will just keep it simple and use the learner.fit() function which takes in at a minimum the learning rate and the number of cycles that you want to train it."
},
{
"metadata": {
"scrolled": true,
"trusted": true
},
"cell_type": "code",
"source": "learner.fit(lr,10)",
"execution_count": 32,
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1effa6fa05534ce992e2ddfd08d1a6b9",
"version_major": 2,
"version_minor": 0
},
"text/html": "<p>Failed to display Jupyter Widget of type <code>HBox</code>.</p>\n<p>\n If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n that the widgets JavaScript is still loading. If this message persists, it\n likely means that the widgets JavaScript library is either not installed or\n not enabled. See the <a href=\"https://ipywidgets.readthedocs.io/en/stable/user_install.html\">Jupyter\n Widgets Documentation</a> for setup instructions.\n</p>\n<p>\n If you're reading this message in another frontend (for example, a static\n rendering on GitHub or <a href=\"https://nbviewer.jupyter.org/\">NBViewer</a>),\n it may mean that your frontend doesn't currently support widgets.\n</p>\n",
"text/plain": "HBox(children=(IntProgress(value=0, description='Epoch', max=10), HTML(value='')))"
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": "epoch trn_loss val_loss accuracy \n 0 0.877209 0.613849 0.829531 \n 1 0.756472 0.578578 0.826748 \n 2 0.802857 0.575006 0.832028 \n 3 0.831325 0.588339 0.823894 \n 4 0.738141 0.561355 0.825936 \n 5 0.916614 0.610922 0.829739 \n 6 0.945601 0.646737 0.824805 \n 7 0.929306 0.598057 0.826581 \n 8 0.836129 0.579624 0.827866 \n 9 0.76676 0.568329 0.830627 \n\n"
},
{
"data": {
"text/plain": "[0.5683291, 0.83062704064344106]"
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Making a Prediction\nAfter 10 cycles of training and an accuracy of ~82%, a prediction can be made. To make a prediction, the learner.model function is called and a torch variable is passed as the argument with a list of the desired prediction. Since I know my index should always just increment by one, I skip the translate from word to integer step and feed the token directly into the predictor."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "needPrediction = np.array([[5]])\nprobs = learner.model(V(needPrediction))",
"execution_count": 33,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "In index 0 of the prediction set there is a Torch variable with 13 predictions."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "probs[0]",
"execution_count": 34,
"outputs": [
{
"data": {
"text/plain": "Variable containing:\n\nColumns 0 to 7 \n -9.8337 -10.0553 3.9059 -0.1862 -5.8219 -2.0377 7.0377 3.7314\n\nColumns 8 to 12 \n 0.0088 -7.0428 -0.0959 -2.0290 -7.9542\n[torch.cuda.FloatTensor of size 1x13 (GPU 0)]"
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Next, take the softmax of this number to convert into a percent chance that the number is the next in the sequence"
},
{
"metadata": {
"scrolled": true,
"trusted": true
},
"cell_type": "code",
"source": "total_percentage = 0\nfor i in to_np(F.softmax(probs[0])):\n total_percentage+=i\nprint(total_percentage)",
"execution_count": 35,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "[ 0. 0. 0.0403 0.00067 0. 0.00011 0.92342 0.03384 0.00082 0. 0.00074 0.00011\n 0. ]\n"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "total_percentage.round(3)",
"execution_count": 53,
"outputs": [
{
"data": {
"text/plain": "array([ 0. , 0. , 0.021, 0. , 0. , 0.002, 0.953, 0.02 , 0. , 0. , 0. , 0.004, 0. ], dtype=float32)"
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "After seeing that the predictions are reasonable, I created a loop that checks the prediction output for each of the 13 outputs that we have."
},
{
"metadata": {
"scrolled": true,
"trusted": true
},
"cell_type": "code",
"source": "for i in range(0,13):\n needPrediction = np.array([[i]])\n probs = learner.model(V(needPrediction))\n print(itos[i] + \"--->\" + itos[to_np(probs[0][-1].exp()).argmax()] + \"\\t\"+str(to_np(F.softmax(probs[0][-1])).round(3)))",
"execution_count": 58,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "_unk_--->_eos_\t[ 0. 0. 0.652 0.003 0.344 0. 0. 0. 0. 0.001 0. 0. 0. ]\n_pad_--->zero\t[ 0. 0. 0.195 0.234 0.146 0.006 0. 0.204 0.214 0.001 0. 0. 0. ]\n_eos_--->_eos_\t[ 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077]\nzero--->one\t[ 0. 0. 0.257 0.007 0.73 0.003 0. 0.002 0.001 0. 0. 0. 0. ]\none--->two\t[ 0. 0. 0.03 0. 0. 0.968 0.002 0. 0. 0. 0. 0.001 0. ]\ntwo--->three\t[ 0. 0. 0.032 0. 0. 0.005 0.931 0.026 0. 0. 0. 0.005 0. ]\nthree--->four\t[ 0. 0. 0.097 0. 0. 0. 0. 0.902 0. 0. 0. 0. 0. ]\nfour--->five\t[ 0. 0. 0.145 0. 0. 0. 0. 0. 0.855 0. 0. 0. 0. ]\nfive--->six\t[ 0. 0. 0.079 0. 0. 0. 0. 0. 0. 0.907 0.012 0.001 0.001]\nsix--->seven\t[ 0. 0. 0.046 0. 0. 0. 0. 0.001 0.014 0.008 0.93 0.001 0. ]\nseven--->eight\t[ 0. 0. 0.103 0. 0. 0. 0. 0. 0. 0. 0.002 0.862 0.032]\neight--->nine\t[ 0. 0. 0.216 0.001 0. 0. 0. 0. 0. 0. 0. 0.001 0.781]\nnine--->zero\t[ 0. 0. 0.186 0.808 0.001 0.004 0. 0. 0.001 0. 0. 0. 0. ]\n"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "markdown",
"source": "## Final Thoughts"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "This model may not have much of a life outside of a classroom setting, but don’t underestimate the power of simplification of a concept. This model takes an idea that is usually many thousands of words and simplifies it into just 10 words, and 13 total tokens. This not only makes training fast to iterate your thinking, but also allows a lot more experimentation of the many different hyper-parameters. If this is your first time experiencing language models, I highly recommend looking into Fast.ai which will teach you how to achieve great results and explains all of the concepts that I talked about up top. There are more parameters than I could possibly talk about in a single blog post so I hope that you were able to gather some useful insights from this post and if there are any parameters you would like to see that I didn’t discuss, please let me know and I will add it in."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"_draft": {
"nbviewer_url": "https://gist.github.com/e4bd985620741c6e55ed621fd8655a3a"
},
"gist": {
"id": "e4bd985620741c6e55ed621fd8655a3a",
"data": {
"description": "Counting Example for Language Model",
"public": true
}
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.6.3",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {
"04dba83d703c4c97973b8da5ef017ead": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "IntProgressModel",
"state": {
"bar_style": "success",
"description": "Epoch",
"layout": "IPY_MODEL_1108f27619c04d0c9d1d9d4426a6d44a",
"max": 10,
"style": "IPY_MODEL_e10ddd3114834c63afeea0c9bfee32a1",
"value": 10
}
},
"1108f27619c04d0c9d1d9d4426a6d44a": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"149c1ac37c9b499f842f63d178d2bbc0": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HTMLModel",
"state": {
"layout": "IPY_MODEL_bfc4471dab35446fbd062cee50bc2847",
"style": "IPY_MODEL_2c78bfc16b424345822b8d9ebc5ac7c8",
"value": "100% 10/10 [00:05&lt;00:00, 1.81it/s]"
}
},
"171a3d23566a4916a2d6b38cba073108": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"17fab208c1f24c2da68669be65607e02": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"1a18ef3b0cf44c44b1b1c8fd5b3d2d76": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HBoxModel",
"state": {
"children": [
"IPY_MODEL_04dba83d703c4c97973b8da5ef017ead",
"IPY_MODEL_f8e46f870e7e4adf8082d957777c08f0"
],
"layout": "IPY_MODEL_eab5cdadf3dc446aaf8c51f5a5cea696"
}
},
"1d883cb12ddb470cab085f26452a0799": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HTMLModel",
"state": {
"layout": "IPY_MODEL_bba46c77a54442e89a600703618806ed",
"style": "IPY_MODEL_3850389523bd4061a875a4f37bcff987",
"value": " 0% 0/1 [00:00&lt;?, ?it/s]"
}
},
"2817a1e93ee84813a93fe3511ed5d7f7": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "DescriptionStyleModel",
"state": {
"description_width": ""
}
},
"2c78bfc16b424345822b8d9ebc5ac7c8": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "DescriptionStyleModel",
"state": {
"description_width": ""
}
},
"32ba039760b346578df3f09c14fe73ea": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HTMLModel",
"state": {
"layout": "IPY_MODEL_6b473e701eca4822bb8b8b99db3c4895",
"style": "IPY_MODEL_c083769380ef4a46a141905cce1186e6",
"value": "100% 15/15 [00:08&lt;00:00, 1.75it/s]"
}
},
"3850389523bd4061a875a4f37bcff987": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "DescriptionStyleModel",
"state": {
"description_width": ""
}
},
"3b46fb458f9b4dda8d895c773bc729ee": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "ProgressStyleModel",
"state": {
"description_width": ""
}
},
"3f9fcf7d19264eb5b3f30e1e538ff6d8": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "ProgressStyleModel",
"state": {
"description_width": ""
}
},
"46f844869b6847c29efc5d499d5d6acb": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HBoxModel",
"state": {
"children": [
"IPY_MODEL_6b051936f29a4a14b6ab740d52c2f6e5",
"IPY_MODEL_d8f90f2fb1bc452688b37f06cc9bcb4e"
],
"layout": "IPY_MODEL_d1273af340b74f5ba133d5bb64969035"
}
},
"49e99574d3464b559337da997c06b39b": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"4fb7b8299d9043c0a8d3b192de40b57d": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "IntProgressModel",
"state": {
"bar_style": "success",
"description": "Epoch",
"layout": "IPY_MODEL_5b72faa4fe8c45e3a3d22c70a6c5243f",
"max": 10,
"style": "IPY_MODEL_a36ebd91b6c848f3bddb4e3d217d6d53",
"value": 10
}
},
"5871c1f5cc7b457cbaa797c3efdfe905": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "IntProgressModel",
"state": {
"bar_style": "success",
"description": "Epoch",
"layout": "IPY_MODEL_866a435f08b542e5b68b6a58a71584d0",
"max": 15,
"style": "IPY_MODEL_3b46fb458f9b4dda8d895c773bc729ee",
"value": 15
}
},
"5b72faa4fe8c45e3a3d22c70a6c5243f": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"63871322b0d2493785a64b1cfaab2659": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HBoxModel",
"state": {
"children": [
"IPY_MODEL_4fb7b8299d9043c0a8d3b192de40b57d",
"IPY_MODEL_149c1ac37c9b499f842f63d178d2bbc0"
],
"layout": "IPY_MODEL_e18888eafcc74d02ad321836ad293ca5"
}
},
"65dae5ebd0c742d7a305399423c390ec": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"6b051936f29a4a14b6ab740d52c2f6e5": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "IntProgressModel",
"state": {
"bar_style": "success",
"description": "Epoch",
"layout": "IPY_MODEL_171a3d23566a4916a2d6b38cba073108",
"max": 10,
"style": "IPY_MODEL_a5a133ee051e49479606ac4220b033ed",
"value": 10
}
},
"6b473e701eca4822bb8b8b99db3c4895": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"6f45fc011e0b41ed9e0947ed109198d3": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HBoxModel",
"state": {
"children": [
"IPY_MODEL_fb6d77fb2c5b4b5e9c9e870ef85951c5",
"IPY_MODEL_c53ab2d1e299414b8d0f099d80d0d802"
],
"layout": "IPY_MODEL_65dae5ebd0c742d7a305399423c390ec"
}
},
"71988522634f4cf2bb83ac497f944924": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"739576ba78a4474194ebe2d056a02646": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"7892d81a5cef498abbea3e73bd73a9ff": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HBoxModel",
"state": {
"children": [
"IPY_MODEL_c2f9d01f36b9435fad8f80abe783be47",
"IPY_MODEL_1d883cb12ddb470cab085f26452a0799"
],
"layout": "IPY_MODEL_919dad157b264202925bb3c01ae0c57d"
}
},
"84091e10212a43b09db81bc8a97c92d5": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "DescriptionStyleModel",
"state": {
"description_width": ""
}
},
"866a435f08b542e5b68b6a58a71584d0": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"919dad157b264202925bb3c01ae0c57d": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"959dbb9efd6941ca8bdf548f8de88d7c": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"a36ebd91b6c848f3bddb4e3d217d6d53": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "ProgressStyleModel",
"state": {
"description_width": ""
}
},
"a5a133ee051e49479606ac4220b033ed": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "ProgressStyleModel",
"state": {
"description_width": ""
}
},
"a9a611129ff6427d9b5cbc5c40de62bd": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "DescriptionStyleModel",
"state": {
"description_width": ""
}
},
"bba46c77a54442e89a600703618806ed": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"bfc4471dab35446fbd062cee50bc2847": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"c083769380ef4a46a141905cce1186e6": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "DescriptionStyleModel",
"state": {
"description_width": ""
}
},
"c2f9d01f36b9435fad8f80abe783be47": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "IntProgressModel",
"state": {
"bar_style": "danger",
"description": "Epoch",
"layout": "IPY_MODEL_959dbb9efd6941ca8bdf548f8de88d7c",
"max": 1,
"style": "IPY_MODEL_3f9fcf7d19264eb5b3f30e1e538ff6d8"
}
},
"c53ab2d1e299414b8d0f099d80d0d802": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HTMLModel",
"state": {
"layout": "IPY_MODEL_d966057d73304877be1afb753f8f87a0",
"style": "IPY_MODEL_2817a1e93ee84813a93fe3511ed5d7f7",
"value": "100% 100/100 [00:56&lt;00:00, 1.78it/s]"
}
},
"d1273af340b74f5ba133d5bb64969035": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"d2875ddf016045d2837f2083faa77fa3": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HBoxModel",
"state": {
"children": [
"IPY_MODEL_5871c1f5cc7b457cbaa797c3efdfe905",
"IPY_MODEL_32ba039760b346578df3f09c14fe73ea"
],
"layout": "IPY_MODEL_17fab208c1f24c2da68669be65607e02"
}
},
"d8f90f2fb1bc452688b37f06cc9bcb4e": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HTMLModel",
"state": {
"layout": "IPY_MODEL_49e99574d3464b559337da997c06b39b",
"style": "IPY_MODEL_a9a611129ff6427d9b5cbc5c40de62bd",
"value": "100% 10/10 [00:05&lt;00:00, 1.77it/s]"
}
},
"d966057d73304877be1afb753f8f87a0": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"e10ddd3114834c63afeea0c9bfee32a1": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "ProgressStyleModel",
"state": {
"description_width": ""
}
},
"e18888eafcc74d02ad321836ad293ca5": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"eab5cdadf3dc446aaf8c51f5a5cea696": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.0.0",
"model_name": "LayoutModel",
"state": {}
},
"f56541c7d04f4a6b9819665e2004fb64": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "ProgressStyleModel",
"state": {
"description_width": ""
}
},
"f8e46f870e7e4adf8082d957777c08f0": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "HTMLModel",
"state": {
"layout": "IPY_MODEL_739576ba78a4474194ebe2d056a02646",
"style": "IPY_MODEL_84091e10212a43b09db81bc8a97c92d5",
"value": "100% 10/10 [00:05&lt;00:00, 1.79it/s]"
}
},
"fb6d77fb2c5b4b5e9c9e870ef85951c5": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.1.0",
"model_name": "IntProgressModel",
"state": {
"bar_style": "success",
"description": "Epoch",
"layout": "IPY_MODEL_71988522634f4cf2bb83ac497f944924",
"style": "IPY_MODEL_f56541c7d04f4a6b9819665e2004fb64",
"value": 100
}
}
},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment