Skip to content

Instantly share code, notes, and snippets.

@sujitpal
Created May 12, 2016 15:19
Show Gist options
  • Save sujitpal/fe13fb869227de2808367b0a06d6c291 to your computer and use it in GitHub Desktop.
Save sujitpal/fe13fb869227de2808367b0a06d6c291 to your computer and use it in GitHub Desktop.
RNN for Character Generation - Deep Learning Learners Meetup Presentation May-12-2016
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## RNN for Character Generation"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from __future__ import print_function\n",
"from keras.layers.recurrent import SimpleRNN\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense, Activation\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extracting input as stream of characters"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"chapter i. down the rabbit-hole alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought alice 'without pictures or conversations?' so she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain wou ...\n"
]
}
],
"source": [
"lines = []\n",
"fin = open(\"data/alice_in_wonderland.txt\", 'rb')\n",
"for line in fin:\n",
" line = line.strip().lower().decode(\"ascii\", \"ignore\")\n",
" if len(line) == 0:\n",
" continue\n",
" lines.append(line)\n",
"fin.close()\n",
"text = \" \".join(lines)\n",
"print(text[0:500], \"...\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create vocabulary and lookup tables"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"vocab size: 42\n"
]
}
],
"source": [
"chars = set([c for c in text])\n",
"nb_chars = len(chars)\n",
"char2index = {c:i for i, c in enumerate(sorted(chars))}\n",
"index2char = {i:c for i, c in enumerate(sorted(chars))}\n",
"\n",
"print(\"vocab size:\", nb_chars)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create inputs\n",
"\n",
"We step through the test with a moving window of size SEQLEN, moving forward one character at a time, taking the contents of the window as our sequence and the character following that as our output."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"down the r -> a\n",
"own the ra -> b\n",
"wn the rab -> b\n",
"n the rabb -> i\n",
" the rabbi -> t\n",
"the rabbit -> -\n",
"he rabbit- -> h\n",
"e rabbit-h -> o\n",
" rabbit-ho -> l\n"
]
}
],
"source": [
"SEQLEN = 10\n",
"STEP = 1\n",
"input_chars = []\n",
"label_chars = []\n",
"for i in range(0, len(text) - SEQLEN, STEP):\n",
" input_chars.append(text[i:i + SEQLEN])\n",
" label_chars.append(text[i + SEQLEN])\n",
"\n",
"for i in range(11, 20):\n",
" print(input_chars[i], \"->\", label_chars[i])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Vectorize inputs\n",
"\n",
"Input to our RNN is a sequence of characters (before the -> above) and the output is a single character (after the ->). Each character will be represented as a 1-hot encoding over all characters in the vocabulary and will have SEQLEN characters, so the shape of each row of input is (SEQLEN, nb_chars). Similarly, the label is also 1-hot encoded over nb_chars positions, so the shape is (nb_chars,). There are len(input_chars) records."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"X = np.zeros((len(input_chars), SEQLEN, nb_chars), dtype=np.bool)\n",
"y = np.zeros((len(input_chars), nb_chars), dtype=np.bool)\n",
"for i, input_char in enumerate(input_chars):\n",
" for j, ch in enumerate(input_char):\n",
" X[i, j, char2index[ch]] = 1\n",
" y[i, char2index[label_chars[i]]] = 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Build model"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"model = Sequential()\n",
"model.add(SimpleRNN(512, return_sequences=False, input_shape=(SEQLEN, nb_chars)))\n",
"model.add(Dense(nb_chars))\n",
"model.add(Activation(\"softmax\"))\n",
"\n",
"model.compile(loss=\"categorical_crossentropy\", optimizer=\"rmsprop\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"==================================================\n",
"Iteration #: 0\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 45s - loss: 2.1117 \n",
"Seed: offer him\n",
" offer hime the wath the ghit whe wat in the wat on the wath the ghit whe wat in the wat on the wath the ghit \n",
"==================================================\n",
"Iteration #: 1\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 46s - loss: 2.0546 \n",
"Seed: as she we\n",
" as she west in the pait the mont oull at the mout ould and and the kith the matt the mont oull at the mout ou\n",
"==================================================\n",
"Iteration #: 2\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 46s - loss: 1.9982 \n",
"Seed: , will you\n",
", will you could the tore the tore the tore the tore the tore the tore the tore the tore the tore the tore the\n",
"==================================================\n",
"Iteration #: 3\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 47s - loss: 1.9375 \n",
"Seed: esented th\n",
"esented the hare was it and the coure the gryphon the gryphon the gryphon the gryphon the gryphon the gryphon \n",
"==================================================\n",
"Iteration #: 4\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 46s - loss: 1.8759 \n",
"Seed: stirring a\n",
"stirring and the caull the caull the caull the caull the caull the caull the caull the caull the caull the cau\n",
"==================================================\n",
"Iteration #: 5\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 47s - loss: 1.8196 \n",
"Seed: s close be\n",
"s close bean the wind her the cand all the cand all the cand all the cand all the cand all the cand all the ca\n",
"==================================================\n",
"Iteration #: 6\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 47s - loss: 1.7704 \n",
"Seed: rom the ti\n",
"rom the ting to see in a little sad and the tome the dound and the tome the dound and the tome the dound and t\n",
"==================================================\n",
"Iteration #: 7\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 47s - loss: 1.7243 \n",
"Seed: it in a b\n",
" it in a bering all the sing to see what she was to she was to she was to she was to she was to she was to she\n",
"==================================================\n",
"Iteration #: 8\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 48s - loss: 1.6822 \n",
"Seed: what's mo\n",
" what's mone the hatter was so ffor the say she had nothing of the sabe of the sabe of the sabe of the sabe of\n",
"==================================================\n",
"Iteration #: 9\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 47s - loss: 1.6461 \n",
"Seed: tunity for\n",
"tunity for the round so she had not to began the round so she had not to began the round so she had not to beg\n",
"==================================================\n",
"Iteration #: 10\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 49s - loss: 1.6114 \n",
"Seed: nd till th\n",
"nd till the the mare to dear here were the same a said the mare to dear here were the same a said the mare to \n",
"==================================================\n",
"Iteration #: 11\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 49s - loss: 1.5822 \n",
"Seed: as far out\n",
"as far out it was a little pear the time the time the time the time the time the time the time the time the ti\n",
"==================================================\n",
"Iteration #: 12\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 48s - loss: 1.5527 \n",
"Seed: ormouse be\n",
"ormouse began to the sare was time the king a little sard the king a little sard the king a little sard the ki\n",
"==================================================\n",
"Iteration #: 13\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 49s - loss: 1.5281 \n",
"Seed: said, by w\n",
"said, by whith she was the sight and she went on, and she went on, and she went on, and she went on, and she w\n",
"==================================================\n",
"Iteration #: 14\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 49s - loss: 1.5049 \n",
"Seed: ther bit. \n",
"ther bit. 'i had nettered to herself a ding and the book of the ornared of the ornared of the ornared of the o\n",
"==================================================\n",
"Iteration #: 15\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 50s - loss: 1.4812 \n",
"Seed: (she was \n",
" (she was so many the table, and the stille the tratsed the door as the say, and the stille the tratsed the do\n",
"==================================================\n",
"Iteration #: 16\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 52s - loss: 1.4610 \n",
"Seed: e march ha\n",
"e march hare some to the queen on the sard into the stort and she said alice, and she to be to get in a lot to\n",
"==================================================\n",
"Iteration #: 17\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 58s - loss: 1.4431 \n",
"Seed: ch the mar\n",
"ch the march hare.) 'i don't a long to herself in a long so factly for said alice. 'i don't a long to herself \n",
"==================================================\n",
"Iteration #: 18\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 56s - loss: 1.4241 \n",
"Seed: e.) 'up, l\n",
"e.) 'up, leave her down and a pit and the say eage on the say eage on the say eage on the say eage on the say \n",
"==================================================\n",
"Iteration #: 19\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 55s - loss: 1.4089 \n",
"Seed: quite know\n",
"quite know when the sade again. the dormouse said to herself up to say you do no don't know when the sade agai\n",
"==================================================\n",
"Iteration #: 20\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.3906 \n",
"Seed: what lati\n",
" what latien the try began trouble of the tries, i don't see, he dent the don't be an all the sing how she wen\n",
"==================================================\n",
"Iteration #: 21\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.3792 \n",
"Seed: id alice. \n",
"id alice. 'what i had a poor of the rabbit headd in a some of the way in a minute of the way in a minute of th\n",
"==================================================\n",
"Iteration #: 22\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.3655 \n",
"Seed: ge lobster\n",
"ge lobsters and the mouse for its alice came the caterpillar with the lize when the mouse for its alice came t\n",
"==================================================\n",
"Iteration #: 23\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.3542 \n",
"Seed: d the smal\n",
"d the small on the bottle so mad the dormouse the constered to herself up at the cook first mad a little of me\n",
"==================================================\n",
"Iteration #: 24\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 55s - loss: 1.3408 \n",
"Seed: cauldron \n",
" cauldron of the best that she had plater the bright all the restance the pload not the rabbit as so she was t\n",
"==================================================\n",
"Iteration #: 25\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 53s - loss: 1.3313 \n",
"Seed: urtle. 'se\n",
"urtle. 'seemed to see how the mock turtle replied to herself in a very sorn the forthouse the mock turtle repl\n",
"==================================================\n",
"Iteration #: 26\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 55s - loss: 1.3187 \n",
"Seed: nk anythin\n",
"nk anything to the jury of the gryphon seemed to the door as she said to the jury of the gryphon seemed to the\n",
"==================================================\n",
"Iteration #: 27\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 56s - loss: 1.3096 \n",
"Seed: rabbit put\n",
"rabbit put it was the march hare the more that she made of the hatter was the march hare the more that she mad\n",
"==================================================\n",
"Iteration #: 28\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 55s - loss: 1.3000 \n",
"Seed: e was a li\n",
"e was a little parch here the parchis to be a word said the garden into have to be sought to be a pleased the \n",
"==================================================\n",
"Iteration #: 29\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 56s - loss: 1.2961 \n",
"Seed: the knave\n",
" the knave of the moral of the only one of the jury all made it was the mouse to the hatter was to alice as it\n",
"==================================================\n",
"Iteration #: 30\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.2828 \n",
"Seed: 'i know s\n",
" 'i know shary sit of the resting so the reason in a long as a long and the moment she found herself the mock \n",
"==================================================\n",
"Iteration #: 31\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 53s - loss: 1.2769 \n",
"Seed: e,' said t\n",
"e,' said the knave of her head of showling to herself in a very much at the roof the conversation the parch at\n",
"==================================================\n",
"Iteration #: 32\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 52s - loss: 1.2714 \n",
"Seed: ers, excep\n",
"ers, except in a very suppeared the most cats even found all the other side of the court, but i gave had for t\n",
"==================================================\n",
"Iteration #: 33\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.2665 \n",
"Seed: allets liv\n",
"allets live an the cook fight to be a little shriek of the ground all the white rabbit rumbring on the book of\n",
"==================================================\n",
"Iteration #: 34\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 57s - loss: 1.2634 \n",
"Seed: for her t\n",
" for her to see who seemed to be a minuter starpped the queen said to alice, as she said to herself, and read \n",
"==================================================\n",
"Iteration #: 35\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.2573 \n",
"Seed: omewhere,'\n",
"omewhere,' said the march hare, the mock turtle and and the mock turtle and and the mock turtle and and the mo\n",
"==================================================\n",
"Iteration #: 36\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 53s - loss: 1.2518 \n",
"Seed: all that,'\n",
"all that,' said the dormouse words, they were the court, and she had a little to see it was a large piese they\n",
"==================================================\n",
"Iteration #: 37\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.2513 \n",
"Seed: t there wa\n",
"t there was no here, to they were it to the jurors all repeated at the table the gryphon large more the queen \n",
"==================================================\n",
"Iteration #: 38\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 53s - loss: 1.2482 \n",
"Seed: so as to p\n",
"so as to put on the were the court, i wonder?' 'i don't know that was a very such a duchess to done, you know.\n",
"==================================================\n",
"Iteration #: 39\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.2445 \n",
"Seed: ! it was t\n",
"! it was the hatter up like the look of the poor little pired out of the words as she had not a moment the hat\n",
"==================================================\n",
"Iteration #: 40\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.2462 \n",
"Seed: upon her \n",
" upon her so such a dain and seen the sorther was to alice, and she her hend and the sharp an it adventurts of\n",
"==================================================\n",
"Iteration #: 41\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.2423 \n",
"Seed: looking ha\n",
"looking hardly on its she was a pealed on she well something what it was a lowd to see the white rabbit say so\n",
"==================================================\n",
"Iteration #: 42\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 55s - loss: 1.2420 \n",
"Seed: , as she w\n",
", as she was surring to herself, and the poor little thing was the duchess something to the court was a large \n",
"==================================================\n",
"Iteration #: 43\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 55s - loss: 1.2423 \n",
"Seed: ,' added t\n",
",' added the hatter which what i shall be nother as she could not remember and have know what in a long out a \n",
"==================================================\n",
"Iteration #: 44\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.2423 \n",
"Seed: red to the\n",
"red to the king. 'it's an a tere to see them little while the caterpillar the white rabbit was that it was all\n",
"==================================================\n",
"Iteration #: 45\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.2419 \n",
"Seed: gin again,\n",
"gin again, and the same thing to say it as it was the damp the dormouse said in a very please your majesty,' s\n",
"==================================================\n",
"Iteration #: 46\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 54s - loss: 1.2416 \n",
"Seed: hind us, a\n",
"hind us, and the mouse replied; alice to herself, 'i'l she was good the door of the great call not poss it was\n",
"==================================================\n",
"Iteration #: 47\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 55s - loss: 1.2486 \n",
"Seed: a growl, a\n",
"a growl, as she had never be tures, the mock turtle. 'be no nothing i should be a hears! i shall replied alice\n",
"==================================================\n",
"Iteration #: 48\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 55s - loss: 1.2501 \n",
"Seed: ils fast i\n",
"ils fast in a very deeply kith a gris trainst hime the mock turtle replied the king and the white rabbit here,\n",
"==================================================\n",
"Iteration #: 49\n",
"Epoch 1/1\n",
"142544/142544 [==============================] - 56s - loss: 1.2482 \n",
"Seed: rowful wil\n",
"rowful will you, won't you, with your plakse the cat again. 'it do you to see it was the dormouse some of the \n"
]
}
],
"source": [
"batch_size = 128\n",
"for i in range(50):\n",
" print(\"=\" * 50)\n",
" print(\"Iteration #: %d\" % (i))\n",
" model.fit(X, y, batch_size=batch_size, nb_epoch=1)\n",
" \n",
" # testing, pick a sequence randomly as seed and use it to generate text from\n",
" # model for the next 100 steps\n",
" test_idx = np.random.randint(len(input_chars))\n",
" test_chars = input_chars[test_idx]\n",
" print(\"Seed: %s\" % (test_chars))\n",
" print(test_chars, end=\"\")\n",
" for j in range(100):\n",
" Xtest = np.zeros((1, SEQLEN, nb_chars))\n",
" for k, ch in enumerate(test_chars):\n",
" Xtest[0, k, char2index[ch]] = 1\n",
" pred = model.predict(Xtest, verbose=0)[0]\n",
" ypred = index2char[np.argmax(pred)]\n",
" print(ypred, end=\"\")\n",
" # move forward with test_chars + ypred\n",
" test_chars = test_chars[1:] + ypred\n",
" print()\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment