tsu-nera/nazokake.ipynb

## nazokake.ipynb
{
  "cells": [
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "# Nazokake with gensim"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "事前に cython, gensimをインストールしておく。\n\n    conda install cython\n    conda install gensim\n\n情報元として、[text8](http://mattmahoney.net/dc/textdata.html) を利用する。\n\n    wget http://mattmahoney.net/dc/text8.zip -P /tmp\n    unzip text8.zip\n"
    },
    {
      "metadata": {
        "trusted": true,
        "collapsed": false
      },
      "cell_type": "code",
      "source": "from gensim.models import word2vec\ndata = word2vec.Text8Corpus('/tmp/text8')\nmodel = word2vec.Word2Vec(data, size=200)",
      "execution_count": 172,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true,
        "collapsed": true
      },
      "cell_type": "code",
      "source": "model.save(\"sample.model\")",
      "execution_count": 51,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true,
        "collapsed": false
      },
      "cell_type": "code",
      "source": "model.most_similar(positive=['dog','cat'])",
      "execution_count": 170,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": "[('goat', 0.7966023683547974),\n ('bee', 0.7859479784965515),\n ('pig', 0.7830795645713806),\n ('bird', 0.7660524845123291),\n ('hound', 0.7580216526985168),\n ('panda', 0.7541525363922119),\n ('hamster', 0.7503507137298584),\n ('ass', 0.7488968372344971),\n ('haired', 0.7469390630722046),\n ('rat', 0.7466884851455688)]"
          },
          "metadata": {},
          "execution_count": 170
        }
      ]
    },
    {
      "metadata": {
        "trusted": true,
        "collapsed": false
      },
      "cell_type": "code",
      "source": "wx = 'japanese'\nx = model.wv[wx]",
      "execution_count": 121,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true,
        "collapsed": false
      },
      "cell_type": "code",
      "source": "wy = 'smart'\ny = model.wv[wy]",
      "execution_count": 165,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true,
        "collapsed": false
      },
      "cell_type": "code",
      "source": "sx = set()\nfor word, emb in model.most_similar([x], [], 500):\n    sx.add(word)",
      "execution_count": 166,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true,
        "collapsed": false
      },
      "cell_type": "code",
      "source": "sy = set()\nfor word, emb in model.most_similar([y], [], 500):\n    sy.add(word)",
      "execution_count": 167,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true,
        "collapsed": false
      },
      "cell_type": "code",
      "source": "sz = (sx & sy)",
      "execution_count": 168,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true,
        "collapsed": false
      },
      "cell_type": "code",
      "source": "for wz in sz:\n    print(wx + \"とかけまして\" + wy + \"と解く。その心は・・・\" + wz + \"でございます\")",
      "execution_count": 169,
      "outputs": [
        {
          "output_type": "stream",
          "text": "japaneseとかけましてsmartと解く。その心は・・・capcomでございます\njapaneseとかけましてsmartと解く。その心は・・・starcraftでございます\njapaneseとかけましてsmartと解く。その心は・・・doraemonでございます\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "editable": true,
        "deletable": true
      },
      "cell_type": "markdown",
      "source": "## Reference\n* [青空文庫のデータを使って、遅ればせながらword2vecと戯れてみた - 六本木で働くデータサイエンティストのブログ](http://tjo.hatenablog.com/entry/2014/06/19/233949)\n* [Word2VecをPythonでやってみる | Foolean](https://foolean.net/p/71)\n* [python - How to find the closest word to a vector using word2vec - Stack Overflow](http://stackoverflow.com/questions/32759712/how-to-find-the-closest-word-to-a-vector-using-word2vec)\n* [word2vecを使って、日本語wikipediaのデータを学習する - Qiita](http://qiita.com/tsuruchan/items/7d3af5c5e9182230db4e)"
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "conda-env-dlnd-py",
      "display_name": "Python [conda env:dlnd]",
      "language": "python"
    },
    "toc": {
      "threshold": 4,
      "number_sections": false,
      "toc_cell": false,
      "toc_window_display": false,
      "toc_section_display": "block",
      "sideBar": true,
      "navigate_menu": true,
      "moveMenuLeft": true,
      "widenNotebook": false,
      "colors": {
        "hover_highlight": "#DAA520",
        "selected_highlight": "#FFD700",
        "running_highlight": "#FF0000"
      },
      "nav_menu": {
        "width": "254px",
        "height": "39px"
      }
    },
    "language_info": {
      "name": "python",
      "version": "3.6.1",
      "mimetype": "text/x-python",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "pygments_lexer": "ipython3",
      "nbconvert_exporter": "python",
      "file_extension": ".py"
    },
    "notify_time": "30",
    "gist": {
      "id": "",
      "data": {
        "description": "gensimで謎かけ",
        "public": true
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}
	{
	"cells": [
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "# Nazokake with gensim"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "事前に cython, gensimをインストールしておく。\n\n conda install cython\n conda install gensim\n\n情報元として、[text8](http://mattmahoney.net/dc/textdata.html) を利用する。\n\n wget http://mattmahoney.net/dc/text8.zip -P /tmp\n unzip text8.zip\n"
	},
	{
	"metadata": {
	"trusted": true,
	"collapsed": false
	},
	"cell_type": "code",
	"source": "from gensim.models import word2vec\ndata = word2vec.Text8Corpus('/tmp/text8')\nmodel = word2vec.Word2Vec(data, size=200)",
	"execution_count": 172,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true,
	"collapsed": true
	},
	"cell_type": "code",
	"source": "model.save(\"sample.model\")",
	"execution_count": 51,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true,
	"collapsed": false
	},
	"cell_type": "code",
	"source": "model.most_similar(positive=['dog','cat'])",
	"execution_count": 170,
	"outputs": [
	{
	"output_type": "execute_result",
	"data": {
	"text/plain": "[('goat', 0.7966023683547974),\n ('bee', 0.7859479784965515),\n ('pig', 0.7830795645713806),\n ('bird', 0.7660524845123291),\n ('hound', 0.7580216526985168),\n ('panda', 0.7541525363922119),\n ('hamster', 0.7503507137298584),\n ('ass', 0.7488968372344971),\n ('haired', 0.7469390630722046),\n ('rat', 0.7466884851455688)]"
	},
	"metadata": {},
	"execution_count": 170
	}
	]
	},
	{
	"metadata": {
	"trusted": true,
	"collapsed": false
	},
	"cell_type": "code",
	"source": "wx = 'japanese'\nx = model.wv[wx]",
	"execution_count": 121,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true,
	"collapsed": false
	},
	"cell_type": "code",
	"source": "wy = 'smart'\ny = model.wv[wy]",
	"execution_count": 165,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true,
	"collapsed": false
	},
	"cell_type": "code",
	"source": "sx = set()\nfor word, emb in model.most_similar([x], [], 500):\n sx.add(word)",
	"execution_count": 166,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true,
	"collapsed": false
	},
	"cell_type": "code",
	"source": "sy = set()\nfor word, emb in model.most_similar([y], [], 500):\n sy.add(word)",
	"execution_count": 167,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true,
	"collapsed": false
	},
	"cell_type": "code",
	"source": "sz = (sx & sy)",
	"execution_count": 168,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true,
	"collapsed": false
	},
	"cell_type": "code",
	"source": "for wz in sz:\n print(wx + \"とかけまして\" + wy + \"と解く。その心は・・・\" + wz + \"でございます\")",
	"execution_count": 169,
	"outputs": [
	{
	"output_type": "stream",
	"text": "japaneseとかけましてsmartと解く。その心は・・・capcomでございます\njapaneseとかけましてsmartと解く。その心は・・・starcraftでございます\njapaneseとかけましてsmartと解く。その心は・・・doraemonでございます\n",
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {
	"editable": true,
	"deletable": true
	},
	"cell_type": "markdown",
	"source": "## Reference\n* [青空文庫のデータを使って、遅ればせながらword2vecと戯れてみた - 六本木で働くデータサイエンティストのブログ](http://tjo.hatenablog.com/entry/2014/06/19/233949)\n* [Word2VecをPythonでやってみる \| Foolean](https://foolean.net/p/71)\n* [python - How to find the closest word to a vector using word2vec - Stack Overflow](http://stackoverflow.com/questions/32759712/how-to-find-the-closest-word-to-a-vector-using-word2vec)\n* [word2vecを使って、日本語wikipediaのデータを学習する - Qiita](http://qiita.com/tsuruchan/items/7d3af5c5e9182230db4e)"
	}
	],
	"metadata": {
	"kernelspec": {
	"name": "conda-env-dlnd-py",
	"display_name": "Python [conda env:dlnd]",
	"language": "python"
	},
	"toc": {
	"threshold": 4,
	"number_sections": false,
	"toc_cell": false,
	"toc_window_display": false,
	"toc_section_display": "block",
	"sideBar": true,
	"navigate_menu": true,
	"moveMenuLeft": true,
	"widenNotebook": false,
	"colors": {
	"hover_highlight": "#DAA520",
	"selected_highlight": "#FFD700",
	"running_highlight": "#FF0000"
	},
	"nav_menu": {
	"width": "254px",
	"height": "39px"
	}
	},
	"language_info": {
	"name": "python",
	"version": "3.6.1",
	"mimetype": "text/x-python",
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"pygments_lexer": "ipython3",
	"nbconvert_exporter": "python",
	"file_extension": ".py"
	},
	"notify_time": "30",
	"gist": {
	"id": "",
	"data": {
	"description": "gensimで謎かけ",
	"public": true
	}
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}