Created
May 9, 2016 00:47
-
-
Save linanqiu/2759cd8db19745bffe0c4a252fe4dc88 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Verifying Word2Vec GPU" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Found a GPU implementation of `word2vec` on GitHub. Shout out to the guy's username. Love it.\n", | |
"\n", | |
"[https://github.com/whatupbiatch/cuda-word2vec](https://github.com/whatupbiatch/cuda-word2vec)\n", | |
"\n", | |
"Specifically, it only implements to Continuous Bag of Words (CBOW) half of word2vec (the other being the Skip-Gram). However, most CUDA implementations so far have been inaccurate / memory constrained. It'd serve us well to verify the results of this GPU implementation.\n", | |
"\n", | |
"**Spoilers first: The CUDA code gives inaccurate embeddings**." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"But before that, here's why we're doing this.\n", | |
"\n", | |
"On a 8 core CPU with 32G of RAM, `gensim`'s implementation of `word2vec` ran at around 20K words per second. The same dataset with the same parameters ran at 400K words per second. This speed up is very sexy." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We'll train a CPU version of the model and a GPU version of the model using the same parameters, and do the following\n", | |
"\n", | |
"1. Check their cosine distance for every vocabulary and see that it behaves like that of 2 CPU models\n", | |
"2. Check word similarities to see that the model makes sense" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"ename": "ImportError", | |
"evalue": "Matplotlib qt-based backends require an external PyQt4, PyQt5,\nor PySide package to be installed, but it was not found.", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mImportError\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-1-5160e98e78c6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpyplot\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstyle\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0muse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'ggplot'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmagic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mu'matplotlib inline'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmagic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mu\"config InlineBackend.figure_format = 'svg'\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m/Users/linanqiu/.virtualenv/default/lib/python2.7/site-packages/matplotlib/pyplot.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 112\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 113\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackends\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpylab_setup\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 114\u001b[0;31m \u001b[0m_backend_mod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnew_figure_manager\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdraw_if_interactive\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_show\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpylab_setup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 115\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 116\u001b[0m \u001b[0m_IP_REGISTERED\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m/Users/linanqiu/.virtualenv/default/lib/python2.7/site-packages/matplotlib/backends/__init__.pyc\u001b[0m in \u001b[0;36mpylab_setup\u001b[0;34m()\u001b[0m\n\u001b[1;32m 30\u001b[0m \u001b[0;31m# imports. 0 means only perform absolute imports.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 31\u001b[0m backend_mod = __import__(backend_name,\n\u001b[0;32m---> 32\u001b[0;31m globals(),locals(),[backend_name],0)\n\u001b[0m\u001b[1;32m 33\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 34\u001b[0m \u001b[0;31m# Things we pull in from all backends\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m/Users/linanqiu/.virtualenv/default/lib/python2.7/site-packages/matplotlib/backends/backend_qt5agg.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 14\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0mbackend_agg\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mFigureCanvasAgg\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 15\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0mbackend_qt5\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mQtCore\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 16\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0mbackend_qt5\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mQtGui\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0mbackend_qt5\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mFigureManagerQT\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m/Users/linanqiu/.virtualenv/default/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 29\u001b[0m \u001b[0mfigureoptions\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 30\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 31\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0mqt_compat\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mQtCore\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mQtGui\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mQtWidgets\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_getSaveFileName\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m__version__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 32\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackends\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mqt_editor\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformsubplottool\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mUiSubplotTool\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 33\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m/Users/linanqiu/.virtualenv/default/lib/python2.7/site-packages/matplotlib/backends/qt_compat.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 160\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mImportError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 161\u001b[0m raise ImportError(\n\u001b[0;32m--> 162\u001b[0;31m \u001b[0;34m\"Matplotlib qt-based backends require an external PyQt4, PyQt5,\\n\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 163\u001b[0m \"or PySide package to be installed, but it was not found.\")\n\u001b[1;32m 164\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;31mImportError\u001b[0m: Matplotlib qt-based backends require an external PyQt4, PyQt5,\nor PySide package to be installed, but it was not found." | |
] | |
} | |
], | |
"source": [ | |
"import matplotlib.pyplot as plt\n", | |
"plt.style.use('ggplot')\n", | |
"%matplotlib inline\n", | |
"%config InlineBackend.figure_format = 'svg'\n", | |
"\n", | |
"from lib.w2v.w2v import *" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"cpu_model = model_from_saved(\"./temp/models-ec2/enron\", binary=False)\n", | |
"gpu_model = model_from_saved(\"./temp/models-gpu/enron\", binary=False)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Cosine Distance\n", | |
"\n", | |
"First let's measure the cosine distance between vocabularies." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x1104e5c90>" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEACAYAAAC6d6FnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFOBJREFUeJzt3W2MHdd93/HvT5IFK5ZqQnVASRQDCQ3VmoUN22rMtLFR\nurUF1igk9Y3koFWEhAgCsK2MIkhCpkDNAkVi50UbuYH0IrEjyo1VEE4jKJAqk1a9qBEkYp1INm1K\noViAqLkNV67ryHlACgr698XOmuP1cvfu3fu45/sBFjwzd2b2HO6985tz5uGmqpAkteeqaVdAkjQd\nBoAkNcoAkKRGGQCS1CgDQJIaZQBIUqMGCoAkO5J8LslLSc4k2ZfkxiQnk5xNciLJjt7yR5K8kuTl\nJHf15t+Z5HT32sPjaJAkaTCD9gAeBp6pqrcD7wReBg4DJ6vqDuC5bpoke4H7gb3AAeCRJOm28yhw\nsKr2AHuSHBhZSyRJm7JhACR5K/D+qvo0QFW9XlWvAXcDx7rFjgH3duV7gCeq6lJVnQfOAfuS3Azc\nUFWnuuUe760jSZqwQXoAtwPfTPKbSf4oya8neQuws6qWumWWgJ1d+RbgQm/9C8CuNeYvdvMlSVMw\nSABcA7wHeKSq3gP8Bd1wz4pafp6Ez5SQpDlyzQDLXAAuVNX/6KY/BxwBLia5qaoudsM7r3avLwK7\ne+vf2m1jsSv35y+u/mVJDBJJ2qSqysZLfa8NA6DbwX8jyR1VdRb4IPD17udB4BPdv092qzwFfDbJ\nv2d5iGcPcKqqKsl3kuwDTgEPAJ8cVUPmQZKjVXV02vUYF9s332zf/Br2wHmQHgDAvwR+K8m1wP8E\nfhK4Gjie5CBwHrgPoKrOJDkOnAFeBw7V5UeOHgIeA65j+aqiZ4eptCRp6wYKgKr6CvAja7z0wSss\n/0vAL60x/w+Bd2ymgpKk8fBO4MlamHYFxmxh2hUYs4VpV2DMFqZdgTFbmHYFZk1m7QthktR2PQcg\nSeMw7H7THoAkNcoAkKRGGQCS1CgDQJIaZQBIUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQo\nA0CSGmUASFKjDABJapQBIEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhp1zbQrIM26JNWfHubLt6VZ\nZABIA1nJgHxPIBgGmmcOAUmbVlwOBGl+2QOQOquHeqTtzh6A9D36R/ce6Wt7MwAkqVEDBUCS80m+\nmuSFJKe6eTcmOZnkbJITSXb0lj+S5JUkLye5qzf/ziSnu9ceHn1zJEmDGrQHUMD+qnp3Vb23m3cY\nOFlVdwDPddMk2QvcD+wFDgCPJFm5UuJR4GBV7QH2JDkwonZIkjZpM0NAqy93uxs41pWPAfd25XuA\nJ6rqUlWdB84B+5LcDNxQVae65R7vrSPNpSS18jPtukibtZkewBeSfDnJT3fzdlbVUldeAnZ25VuA\nC711LwC71pi/2M2XpqK/8x5+B+6JYs2vQS8D/bGq+pMkPwicTPJy/8Wq8ghIc6r/tt3aPV3eMax5\nM1AAVNWfdP9+M8nvAO8FlpLcVFUXu+GdV7vFF4HdvdVvZfnIf7Er9+cvrvX7khztTS5U1cIg9ZSm\na3RhIq0nyX5g/5a3U7X+gXuSHwCurqo/S/IW4ATwb4EPAt+qqk8kOQzsqKrD3Ungz7IcEruALwA/\n3PUSngceAk4BTwOfrKpnV/2+8shJk7B8xL56p10jKK9MX+Z7WuM07H5zkB7ATuB3ugt5rgF+q6pO\nJPkycDzJQeA8cB9AVZ1Jchw4A7wOHKrLKXMIeAy4Dnhm9c5f2l764SDNng17AJNmD0CTMv4ewOXX\nfE9rnIbdb3onsCQ1ygCQpEb5NFA1xcuVpcvsAahB3rwlgQEgSc1yCEiaAL9GUrPIHoA0EQ47afYY\nAJLUKANAkhplAEhSowwASWqUASBJjfIyUG173v0rrc0AUCNm59HM3hOgWeEQkDRx3hOg2WAASFKj\nDABJapQBIEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhplAEhSo3wUhLYln/8jbcwegLYxH7kgrccA\nkKRGGQCS1CgDQJIaNVAAJLk6yQtJfrebvjHJySRnk5xIsqO37JEkryR5Ocldvfl3Jjndvfbw6Jsi\nSdqMQXsAHwXOcPmM2mHgZFXdATzXTZNkL3A/sBc4ADySZOULLx4FDlbVHmBPkgOjaYI0v5LUys+0\n66L2bBgASW4FPgz8Bpe/Tulu4FhXPgbc25XvAZ6oqktVdR44B+xLcjNwQ1Wd6pZ7vLeO1DCvVNL0\nDNID+A/AzwFv9ObtrKqlrrwE7OzKtwAXestdAHatMX+xmy+NjEfT0uaseyNYkn8MvFpVLyTZv9Yy\nVTXyD1ySo73JhapaGOX2tZ3Nznf/SuPS7Y/3b3U7G90J/PeAu5N8GHgz8NeSfAZYSnJTVV3shnde\n7ZZfBHb31r+V5SP/xa7cn794pV9aVUc31QpJakh3ULywMp3kY8NsZ90hoKr6xaraXVW3Ax8B/ltV\nPQA8BTzYLfYg8GRXfgr4SJJrk9wO7AFOVdVF4DtJ9nUnhR/orSMJh7A0eZt9FtDKG/PjwPEkB4Hz\nwH0AVXUmyXGWrxh6HThUVSvrHAIeA64DnqmqZ7dWdWm7cfhKk5XL++fZkKSqyk+ANm35yLm/E91M\neZh1xrctPwPajGH3m94JLEmNMgAkqVEGgCQ1ygCQpEb5jWCaa14yKQ3PHoC2AZ+nIw3DAJCkRhkA\nktQoA0CSGmUASFKjDABJapSXgUozqH95q88F0rjYA5Bmkpe2avwMAElqlAEgSY0yACSpUZ4E1lzx\n2T/S6BgAmkOrv3lL0jAcApKkRhkAktQoA0CSGuU5AGnGrT7x7Z3BGhUDQDPPK3886a3xcAhIc8JH\nI0ijZgBIUqMMAElqlAEgSY0yACSpUesGQJI3J3k+yYtJziT55W7+jUlOJjmb5ESSHb11jiR5JcnL\nSe7qzb8zyenutYfH1yRJ0iDWDYCq+ivgA1X1LuCdwAeSvA84DJysqjuA57ppkuwF7gf2AgeAR5Ks\nXLf2KHCwqvYAe5IcGEeDJEmD2XAIqKr+siteC1wNfBu4GzjWzT8G3NuV7wGeqKpLVXUeOAfsS3Iz\ncENVneqWe7y3jiRpCjYMgCRXJXkRWAK+WFVfB3ZW1VK3yBKwsyvfAlzorX4B2LXG/MVuvqRNSlIr\nP9Oui+bbhncCV9UbwLuSvBX4fJIPrHp95G/EJEd7kwtVtTDK7UvzbeXj5l3BrUqyH9i/1e0M/CiI\nqnotydPAncBSkpuq6mI3vPNqt9gisLu32q0sH/kvduX+/MV1ftfRQeslSa3pDooXVqaTfGyY7Wx0\nFdDbVq7wSXId8CHgBeAp4MFusQeBJ7vyU8BHklyb5HZgD3Cqqi4C30myrzsp/EBvHUnSFGzUA7gZ\nOJbkKpbD4jNV9VySF4DjSQ4C54H7AKrqTJLjwBngdeBQVa30Vw8BjwHXAc9U1bOjbowkaXC5vH+e\nDUnKx93q+88r9ce9Vz8dc63XNlue1W2tv10/K4Lh95veCawZ5hNApXEyACSpUQaAJDXKAJCkRhkA\nktQoA0CSGmUASFKjDABJatTAzwKSNHv6N8x5U5g2yx6ANNe8WU7DswegmeHz7aXJsgegGeMRrTQp\nBoAkNcoAkKRGGQCS1CgDQJIaZQBIUqMMAElqlPcBSNuEdwVrs+wBSNuG91BocwwASWqUQ0CaKh//\nIE2PPQDNAIcupGkwACSpUQaAJDXKAJCkRhkAktQoA0CSGrVhACTZneSLSb6e5GtJHurm35jkZJKz\nSU4k2dFb50iSV5K8nOSu3vw7k5zuXnt4PE2SlKRWfqZdF82uQXoAl4B/VVV/G/hR4J8neTtwGDhZ\nVXcAz3XTJNkL3A/sBQ4AjyRZuS39UeBgVe0B9iQ5MNLWSOp4aa02tmEAVNXFqnqxK/858BKwC7gb\nONYtdgy4tyvfAzxRVZeq6jxwDtiX5Gbghqo61S33eG8dSdKEbeocQJLbgHcDzwM7q2qpe2kJ2NmV\nbwEu9Fa7wHJgrJ6/2M2XJE3BwI+CSHI98NvAR6vqzy6P6kBVjXSsMcnR3uRCVS2MatuSNO+S7Af2\nb3U7AwVAkjexvPP/TFU92c1eSnJTVV3shnde7eYvArt7q9/K8pH/Ylfuz19c6/dV1dGBW6C544lJ\naWu6g+KFlekkHxtmO4NcBRTgU8CZqvrV3ktPAQ925QeBJ3vzP5Lk2iS3A3uAU1V1EfhOkn3dNh/o\nraPmeJJSmrZUrf8hTPI+4L8DX+XyJ/YIcAo4DvwQcB64r6r+tFvnF4GfAl5necjo8938O4HHgOuA\nZ6rqoTV+X/llFtvbcg9g5a0Uhi9vdf152NbWt+vnafsbdr+5YQBMmgGw/RkABoBGa9j9pncCS1Kj\nDABJapQBIEmNMgAkqVEGgCQ1yi+F19h545c0mwwATcjqSxk1Kf0A9pJQ9TkEJG173nWttRkAktQo\nA0CSGmUASFKjPAksNWT1FVmeFG6bASA1xauxdJkBoLHw2n9p9nkOQGPk5YfSLDMAJKlRBoAkNcoA\nkKRGGQCS1CgDQJIaZQBIUqMMAElqlDeCSQ3zuwLaZg9Aapo367XMHoBGxsc/SPPFHoBGzCNKaV4Y\nAJLUKIeAJAGeEG7Rhj2AJJ9OspTkdG/ejUlOJjmb5ESSHb3XjiR5JcnLSe7qzb8zyenutYdH3xRJ\nW+PwXWsGGQL6TeDAqnmHgZNVdQfwXDdNkr3A/cDebp1HkqwcSTwKHKyqPcCeJKu3qTmUpFZ+pl0X\nSZuzYQBU1ZeAb6+afTdwrCsfA+7tyvcAT1TVpao6D5wD9iW5Gbihqk51yz3eW0dzzyNHaR4NexJ4\nZ1UtdeUlYGdXvgW40FvuArBrjfmL3XxJ0pRs+SqgqvLwT5Lm0LBXAS0luamqLnbDO6928xeB3b3l\nbmX5yH+xK/fnL15p40mO9iYXqmphyHpK0raTZD+wf8vbWT6A3/CX3Qb8blW9o5v+FeBbVfWJJIeB\nHVV1uDsJ/FngvSwP8XwB+OGqqiTPAw8Bp4CngU9W1bNr/K7yErT5sXzyd+U9FDYuD7qc25pmHf0M\nzpdh95sb9gCSPAH8feBtSb4B/Bvg48DxJAeB88B9AFV1Jslx4AzwOnCoLifMIeAx4DrgmbV2/pJm\ng/cEtGGgHsAk2QOYfd9/yedsHsVuj21Nv45+HmffsPtNHwWhIXnuX5p3BoAkNcoAkKRG+TA4DcRH\nPUjbjz0AbYLj/tJ2Yg9A0rq8JHT7sgcgaQP2/LYrewCSBmZvYHsxALQmT/pqbf2bxTTvDACtY/Vd\no5K2E88BSFKjDABJapRDQJKGsvo8kSeF548BIGlIniOadw4BSVKjDABJapQBIEmN8hyAvsubv6S2\nGACNW//rHaXB+ZiI+eMQkPBhXxoN30fzxh6ApJGzNzAfDIAGOdav8XMocR44BNQsu+tS6+wBNMKj\nfkmrGQBNsVuuyfN8wOwyACSN2eUDD8NgthgA25jDPpo99kJniQGwjay9w/cDp9lkb2D6Jn4VUJID\nSV5O8kqSX5j079/+Cq/w0Xy4/D5NUv2f6darHRMNgCRXA78GHAD2Aj+e5O2TrMM0Jdk/ou3UWj+j\n2LbWszDtCozZwhR/9/ceuIzjfT2qz992MukewHuBc1V1vqouAf8ZuGfCdZim/aPbVP8o3yP+yViY\ndgXGbGHaFegZSxjs32qttptJnwPYBXyjN30B2DfhOsw0j+Sl1da+iuiKS3s+YWCT7gGMZOeWXPUf\nVw1//JNRbHdSrjSEc/nNfaWje4/01bqNPxvrfLY+NvHqzrhUTW6HkuRHgaNVdaCbPgK8UVWf6C3j\nHk6SNmmYns+kA+Aa4I+Bfwj8b+AU8ONV9dLEKiFJAiZ8DqCqXk/yL4DPA1cDn3LnL0nTMdEegCRp\ndkz1cdBJbkxyMsnZJCeS7FhjmTcneT7Ji0nOJPnladR1GAO2b3eSLyb5epKvJXloGnUdxiDt65b7\ndJKlJKcnXcdhDHKzYpJPdq9/Jcm7J13HYW3UtiR/K8nvJ/mrJD87jTpuxQDt+6fd3+yrSX4vyTun\nUc9hDdC+e7r2vZDkD5P8g3U3WFVT+wF+Bfj5rvwLwMevsNwPdP9eA/wB8L5p1nuU7QNuAt7Vla9n\n+RzJ26dd9xH//d4PvBs4Pe06D9Cmq4FzwG3Am4AXV/89gA8Dz3TlfcAfTLveI2zbDwJ/B/h3wM9O\nu85jaN/fBd7alQ/My99uE+17S6/8Dpbvu7riNqf9hTB3A8e68jHg3rUWqqq/7IrXsvyf8H/HX7WR\n2LB9VXWxql7syn8OvATcMrEabs2gf78vAd+eVKW2aJCbFb/b7qp6HtiRZOdkqzmUDdtWVd+sqi8D\nl6ZRwS0apH2/X1WvdZPPA7dOuI5bMUj7/qI3eT3wf9bb4LQDYGdVLXXlJWDND1GSq5K82C3zxao6\nM6kKbtFA7VuR5DaWj5SfH2+1RmZT7ZsTa92suGuAZeZhRzJI2+bZZtt3EHhmrDUarYHal+TeJC8B\n/xVYd0h57FcBJTnJ8jDHav+6P1FVV7zVu6reAN6V5K3A55Psr6qFkVd2CKNoX7ed64HPAR/tegIz\nYVTtmyODtmH1Ndfz0PZ5qONWDNy+JB8Afgr4sfFVZ+QGal9VPQk8meT9wGeAv3mlZcceAFX1oSu9\n1p0YvKmqLia5GXh1g229luRplscoF0Zb0+GMon1J3gT8NvCfuj/ezBjl329OLAK7e9O7WT7SWm+Z\nW7t5s26Qts2zgdrXnfj9deBAVc3L0CRs8u9XVV9Kck2Sv15V31prmWkPAT0FPNiVHwS+b+eX5G0r\nV5ckuQ74EPDCxGq4NYO0L8CngDNV9asTrNsobNi+OfRlYE+S25JcC9zPcjv7ngJ+Ar57d/uf9obC\nZtkgbVsxj8/T2bB9SX4I+C/AP6uqc1Oo41YM0r6/0e1TSPIegCvt/OlenOZZ7RuBLwBngRPAjm7+\nLcDTXfmdwB+xfMb7q8DPTfts/Ijb9z7gja59L3Q/B6Zd91G1r5t+guU7v/8fy2OYPzntum/Qrn/E\n8tVY54Aj3byfAX6mt8yvda9/BXjPtOs8qraxPNz3DeA1lk/c/y/g+mnXe4Tt+w3gW73P2qlp13nE\n7ft54Gtd274E/Mh62/NGMElq1LSHgCRJU2IASFKjDABJapQBIEmNMgAkqVEGgCQ1ygCQpEYZAJLU\nqP8Pbf/Agzp1BhwAAAAASUVORK5CYII=\n", | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x10d7d2ed0>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x1104e5c90>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"from scipy import spatial\n", | |
"\n", | |
"vocabs = {}\n", | |
"\n", | |
"for word in cpu_model.vocab:\n", | |
" if word in cpu_model and word in gpu_model:\n", | |
" cosine_sim = 1 - spatial.distance.cosine(cpu_model[word], gpu_model[word])\n", | |
" vocabs[word] = cosine_sim\n", | |
"\n", | |
"import matplotlib.pyplot as plot\n", | |
"\n", | |
"plot.hist(vocabs.values(), bins=100)\n", | |
"plot.figure()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This distribution looks amazingly similar to the CPU model (found at `vector_rotation_default.ipynb` in this same directory.)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Word Similarity\n", | |
"\n", | |
"Now let's make sure that the embeddings make sense" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[(u'hi', 0.6631308794021606),\n", | |
" (u'hey', 0.572208046913147),\n", | |
" (u'dear', 0.5451931953430176),\n", | |
" (u'good_morning', 0.5373660326004028),\n", | |
" (u'happy_new_year', 0.4551734924316406),\n", | |
" (u'congratulations', 0.44265079498291016),\n", | |
" (u'hope', 0.4411052167415619),\n", | |
" (u'thanks', 0.43363627791404724),\n", | |
" (u'happy_birthday', 0.41684263944625854),\n", | |
" (u'doing_well', 0.4160914421081543)]" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"cpu_model.most_similar('hello')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[(u'playoffs_antowain_smith', 1.0),\n", | |
" (u'second_guessing', 1.0),\n", | |
" (u'transmisison', 1.0),\n", | |
" (u'remains_unchanged', 1.0),\n", | |
" (u'seoul_korea', 1.0),\n", | |
" (u'clearly_defined', 1.0),\n", | |
" (u'correspond', 1.0),\n", | |
" (u'yds', 1.0),\n", | |
" (u'steve_leppard', 0.22951051592826843),\n", | |
" (u'tiverton', 0.22951051592826843)]" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"gpu_model.most_similar('hello')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Unfortunately, the embeddings don't seem to make too much sense.\n", | |
"\n", | |
"For an entire afternoon I thought I had insane GPU gains. Unfortunately, that's a little too good to be true." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.9" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment