Skip to content

Instantly share code, notes, and snippets.

@tsu-nera
Created May 13, 2017 09:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tsu-nera/f2a7b3feaf5c841d53ce4e6c20c987cb to your computer and use it in GitHub Desktop.
Save tsu-nera/f2a7b3feaf5c841d53ce4e6c20c987cb to your computer and use it in GitHub Desktop.
gensimで謎かけ
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Nazokake with gensim"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "事前に cython, gensimをインストールしておく。\n\n conda install cython\n conda install gensim\n\n情報元として、[text8](http://mattmahoney.net/dc/textdata.html) を利用する。\n\n wget http://mattmahoney.net/dc/text8.zip -P /tmp\n unzip text8.zip\n"
},
{
"metadata": {
"trusted": true,
"collapsed": false
},
"cell_type": "code",
"source": "from gensim.models import word2vec\ndata = word2vec.Text8Corpus('/tmp/text8')\nmodel = word2vec.Word2Vec(data, size=200)",
"execution_count": 172,
"outputs": []
},
{
"metadata": {
"trusted": true,
"collapsed": true
},
"cell_type": "code",
"source": "model.save(\"sample.model\")",
"execution_count": 51,
"outputs": []
},
{
"metadata": {
"trusted": true,
"collapsed": false
},
"cell_type": "code",
"source": "model.most_similar(positive=['dog','cat'])",
"execution_count": 170,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "[('goat', 0.7966023683547974),\n ('bee', 0.7859479784965515),\n ('pig', 0.7830795645713806),\n ('bird', 0.7660524845123291),\n ('hound', 0.7580216526985168),\n ('panda', 0.7541525363922119),\n ('hamster', 0.7503507137298584),\n ('ass', 0.7488968372344971),\n ('haired', 0.7469390630722046),\n ('rat', 0.7466884851455688)]"
},
"metadata": {},
"execution_count": 170
}
]
},
{
"metadata": {
"trusted": true,
"collapsed": false
},
"cell_type": "code",
"source": "wx = 'japanese'\nx = model.wv[wx]",
"execution_count": 121,
"outputs": []
},
{
"metadata": {
"trusted": true,
"collapsed": false
},
"cell_type": "code",
"source": "wy = 'smart'\ny = model.wv[wy]",
"execution_count": 165,
"outputs": []
},
{
"metadata": {
"trusted": true,
"collapsed": false
},
"cell_type": "code",
"source": "sx = set()\nfor word, emb in model.most_similar([x], [], 500):\n sx.add(word)",
"execution_count": 166,
"outputs": []
},
{
"metadata": {
"trusted": true,
"collapsed": false
},
"cell_type": "code",
"source": "sy = set()\nfor word, emb in model.most_similar([y], [], 500):\n sy.add(word)",
"execution_count": 167,
"outputs": []
},
{
"metadata": {
"trusted": true,
"collapsed": false
},
"cell_type": "code",
"source": "sz = (sx & sy)",
"execution_count": 168,
"outputs": []
},
{
"metadata": {
"trusted": true,
"collapsed": false
},
"cell_type": "code",
"source": "for wz in sz:\n print(wx + \"とかけまして\" + wy + \"と解く。その心は・・・\" + wz + \"でございます\")",
"execution_count": 169,
"outputs": [
{
"output_type": "stream",
"text": "japaneseとかけましてsmartと解く。その心は・・・capcomでございます\njapaneseとかけましてsmartと解く。その心は・・・starcraftでございます\njapaneseとかけましてsmartと解く。その心は・・・doraemonでございます\n",
"name": "stdout"
}
]
},
{
"metadata": {
"editable": true,
"deletable": true
},
"cell_type": "markdown",
"source": "## Reference\n* [青空文庫のデータを使って、遅ればせながらword2vecと戯れてみた - 六本木で働くデータサイエンティストのブログ](http://tjo.hatenablog.com/entry/2014/06/19/233949)\n* [Word2VecをPythonでやってみる | Foolean](https://foolean.net/p/71)\n* [python - How to find the closest word to a vector using word2vec - Stack Overflow](http://stackoverflow.com/questions/32759712/how-to-find-the-closest-word-to-a-vector-using-word2vec)\n* [word2vecを使って、日本語wikipediaのデータを学習する - Qiita](http://qiita.com/tsuruchan/items/7d3af5c5e9182230db4e)"
}
],
"metadata": {
"kernelspec": {
"name": "conda-env-dlnd-py",
"display_name": "Python [conda env:dlnd]",
"language": "python"
},
"toc": {
"threshold": 4,
"number_sections": false,
"toc_cell": false,
"toc_window_display": false,
"toc_section_display": "block",
"sideBar": true,
"navigate_menu": true,
"moveMenuLeft": true,
"widenNotebook": false,
"colors": {
"hover_highlight": "#DAA520",
"selected_highlight": "#FFD700",
"running_highlight": "#FF0000"
},
"nav_menu": {
"width": "254px",
"height": "39px"
}
},
"language_info": {
"name": "python",
"version": "3.6.1",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"notify_time": "30",
"gist": {
"id": "",
"data": {
"description": "gensimで謎かけ",
"public": true
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment