Created
May 13, 2017 09:07
-
-
Save tsu-nera/f2a7b3feaf5c841d53ce4e6c20c987cb to your computer and use it in GitHub Desktop.
gensimで謎かけ
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# Nazokake with gensim" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "事前に cython, gensimをインストールしておく。\n\n conda install cython\n conda install gensim\n\n情報元として、[text8](http://mattmahoney.net/dc/textdata.html) を利用する。\n\n wget http://mattmahoney.net/dc/text8.zip -P /tmp\n unzip text8.zip\n" | |
}, | |
{ | |
"metadata": { | |
"trusted": true, | |
"collapsed": false | |
}, | |
"cell_type": "code", | |
"source": "from gensim.models import word2vec\ndata = word2vec.Text8Corpus('/tmp/text8')\nmodel = word2vec.Word2Vec(data, size=200)", | |
"execution_count": 172, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true, | |
"collapsed": true | |
}, | |
"cell_type": "code", | |
"source": "model.save(\"sample.model\")", | |
"execution_count": 51, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true, | |
"collapsed": false | |
}, | |
"cell_type": "code", | |
"source": "model.most_similar(positive=['dog','cat'])", | |
"execution_count": 170, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": "[('goat', 0.7966023683547974),\n ('bee', 0.7859479784965515),\n ('pig', 0.7830795645713806),\n ('bird', 0.7660524845123291),\n ('hound', 0.7580216526985168),\n ('panda', 0.7541525363922119),\n ('hamster', 0.7503507137298584),\n ('ass', 0.7488968372344971),\n ('haired', 0.7469390630722046),\n ('rat', 0.7466884851455688)]" | |
}, | |
"metadata": {}, | |
"execution_count": 170 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true, | |
"collapsed": false | |
}, | |
"cell_type": "code", | |
"source": "wx = 'japanese'\nx = model.wv[wx]", | |
"execution_count": 121, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true, | |
"collapsed": false | |
}, | |
"cell_type": "code", | |
"source": "wy = 'smart'\ny = model.wv[wy]", | |
"execution_count": 165, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true, | |
"collapsed": false | |
}, | |
"cell_type": "code", | |
"source": "sx = set()\nfor word, emb in model.most_similar([x], [], 500):\n sx.add(word)", | |
"execution_count": 166, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true, | |
"collapsed": false | |
}, | |
"cell_type": "code", | |
"source": "sy = set()\nfor word, emb in model.most_similar([y], [], 500):\n sy.add(word)", | |
"execution_count": 167, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true, | |
"collapsed": false | |
}, | |
"cell_type": "code", | |
"source": "sz = (sx & sy)", | |
"execution_count": 168, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true, | |
"collapsed": false | |
}, | |
"cell_type": "code", | |
"source": "for wz in sz:\n print(wx + \"とかけまして\" + wy + \"と解く。その心は・・・\" + wz + \"でございます\")", | |
"execution_count": 169, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "japaneseとかけましてsmartと解く。その心は・・・capcomでございます\njapaneseとかけましてsmartと解く。その心は・・・starcraftでございます\njapaneseとかけましてsmartと解く。その心は・・・doraemonでございます\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"editable": true, | |
"deletable": true | |
}, | |
"cell_type": "markdown", | |
"source": "## Reference\n* [青空文庫のデータを使って、遅ればせながらword2vecと戯れてみた - 六本木で働くデータサイエンティストのブログ](http://tjo.hatenablog.com/entry/2014/06/19/233949)\n* [Word2VecをPythonでやってみる | Foolean](https://foolean.net/p/71)\n* [python - How to find the closest word to a vector using word2vec - Stack Overflow](http://stackoverflow.com/questions/32759712/how-to-find-the-closest-word-to-a-vector-using-word2vec)\n* [word2vecを使って、日本語wikipediaのデータを学習する - Qiita](http://qiita.com/tsuruchan/items/7d3af5c5e9182230db4e)" | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"name": "conda-env-dlnd-py", | |
"display_name": "Python [conda env:dlnd]", | |
"language": "python" | |
}, | |
"toc": { | |
"threshold": 4, | |
"number_sections": false, | |
"toc_cell": false, | |
"toc_window_display": false, | |
"toc_section_display": "block", | |
"sideBar": true, | |
"navigate_menu": true, | |
"moveMenuLeft": true, | |
"widenNotebook": false, | |
"colors": { | |
"hover_highlight": "#DAA520", | |
"selected_highlight": "#FFD700", | |
"running_highlight": "#FF0000" | |
}, | |
"nav_menu": { | |
"width": "254px", | |
"height": "39px" | |
} | |
}, | |
"language_info": { | |
"name": "python", | |
"version": "3.6.1", | |
"mimetype": "text/x-python", | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"pygments_lexer": "ipython3", | |
"nbconvert_exporter": "python", | |
"file_extension": ".py" | |
}, | |
"notify_time": "30", | |
"gist": { | |
"id": "", | |
"data": { | |
"description": "gensimで謎かけ", | |
"public": true | |
} | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment