ftnext/アイテムベースの協調フィルタリング.ipynb

## アイテムベースの協調フィルタリング.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              アイテムベースの協調フィルタリング.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## ユーザベースの協調フィルタリング.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "rettypy #8  @ftnext\n",
    "\n",
    "# やったこと\n",
    "\n",
    "協調フィルタリングの例をPythonで実装\n",
    "\n",
    "参考スライド: [協調フィルタリングを利用した推薦システム構築](https://www.slideshare.net/masayuki1986/recommendation-ml)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 評価値\n",
    "\n",
    "<table>\n",
    "    <tr>\n",
    "        <th></th>\n",
    "        <th>アイテム1</th>\n",
    "        <th>アイテム2</th>\n",
    "        <th>アイテム3</th>\n",
    "        <th>アイテム4</th>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>ユーザA</td>\n",
    "        <td>5</td>\n",
    "        <td>3</td>\n",
    "        <td>5</td>\n",
    "        <td>-</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>ユーザB</td>\n",
    "        <td>2</td>\n",
    "        <td>5</td>\n",
    "        <td>1</td>\n",
    "        <td>5</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>ユーザC</td>\n",
    "        <td>1</td>\n",
    "        <td>4</td>\n",
    "        <td>2</td>\n",
    "        <td>4</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>ユーザD</td>\n",
    "        <td>5</td>\n",
    "        <td>2</td>\n",
    "        <td>-</td>\n",
    "        <td>-</td>\n",
    "    </tr>\n",
    "</table>\n",
    "\n",
    "例：ユーザAはアイテム1を☆5と評価した。-は評価していないことを意味する\n",
    "\n",
    "ユーザDがまだ評価していないアイテム3と4についてどちらをおすすめするか判断したい\n",
    "\n",
    "→アイテム1, 2の評価からユーザDに似ているユーザの評価を重視し、似ていないユーザの評価をあまり重視しないとする"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5.  3.  5. nan]\n",
      " [ 2.  5.  1.  5.]\n",
      " [ 1.  4.  2.  4.]\n",
      " [ 5.  2. nan nan]]\n"
     ]
    }
   ],
   "source": [
    "mat = np.array([\n",
    "    [5, 3, 5, np.nan],\n",
    "    [2, 5, 1, 5],\n",
    "    [1, 4, 2, 4],\n",
    "    [5, 2, np.nan, np.nan]\n",
    "])\n",
    "print(mat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "目次\n",
    "\n",
    "- ユーザDに似ているユーザを算出する\n",
    "- 似ているユーザの評価を使ってどれをおすすめするか判断する"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ユーザDに似ているユーザを算出する\n",
    "\n",
    "ユーザごとの (アイテム1の評価, アイテム2の評価) の「距離」を計算する。\n",
    "\n",
    "ユーザAの場合は(5, 3), ユーザBの場合は(2, 5)となる。これを2次元の座標のように見て距離を計算する。\n",
    "\n",
    "距離が小さいユーザほど近いとなるように距離を0~1のスコアに変換する：$Score = \\frac{1}{1+距離}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 距離からなる行列を作成する\n",
    "def similarity_distance_matrix(rating_matrix):\n",
    "    distance_rows = []\n",
    "    for i in range(0, len(rating_matrix)):\n",
    "        row = [round(np.linalg.norm(rating_matrix[i]-rating_matrix[j]), 2) for j in range(0, len(rating_matrix))]\n",
    "        distance_rows.append(row)\n",
    "    dist_mat = np.array(distance_rows)\n",
    "    return dist_mat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[5. 3.]\n",
      " [2. 5.]\n",
      " [1. 4.]\n",
      " [5. 2.]]\n"
     ]
    }
   ],
   "source": [
    "# 類似度行列の算出に使う列の取り出し\n",
    "print(mat[:, :2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A [5. 3.]\n",
      "B [2. 5.]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "3.605551275463989"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 距離の算出（ユーザAとBの場合）\n",
    "print('A', mat[:, :2][0])\n",
    "print('B', mat[:, :2][1])\n",
    "np.linalg.norm(mat[:, :2][0]-mat[:, :2][1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0.   3.61 4.12 1.  ]\n",
      " [3.61 0.   1.41 4.24]\n",
      " [4.12 1.41 0.   4.47]\n",
      " [1.   4.24 4.47 0.  ]]\n"
     ]
    }
   ],
   "source": [
    "# アイテム1, 2の評価をもとにユーザ間の距離の行列を作成する\n",
    "user_distance_mat = similarity_distance_matrix(mat[:, :2])\n",
    "print(user_distance_mat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 類似度からなる行列を作成する\n",
    "def similarity_score_matrix(rating_matrix):\n",
    "    score_rows = []\n",
    "    for i in range(0, len(rating_matrix)):\n",
    "        row = []\n",
    "        for j in range(0, len(rating_matrix)):\n",
    "            distance = np.linalg.norm(rating_matrix[i]-rating_matrix[j])\n",
    "            score = 1 / (1 + round(distance, 2))\n",
    "            row.append(round(score, 2))\n",
    "        score_rows.append(row)\n",
    "    score_mat = np.array(score_rows)\n",
    "    return score_mat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1.   0.22 0.2  0.5 ]\n",
      " [0.22 1.   0.41 0.19]\n",
      " [0.2  0.41 1.   0.18]\n",
      " [0.5  0.19 0.18 1.  ]]\n"
     ]
    }
   ],
   "source": [
    "user_similarity_mat = similarity_score_matrix(mat[:, :2])\n",
    "print(user_similarity_mat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 似ているユーザの評価を使ってどれをおすすめするか判断する\n",
    "\n",
    "ユーザA, B, CがそれぞれユーザDにどれほど似ているかはわかっている。\n",
    "\n",
    "ユーザA, B, Cのアイテム3, 4への評価値を使って、ユーザDにどちらをおすすめするか考える"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def normalized_score(me, item, similarity_score, rating_matrix):\n",
    "    \"\"\"ユーザmeについてitemへの正規化済みの重み付け評価値を算出する\n",
    "    \n",
    "    me: 正規化済み重み付けスコア算出対象。rating_matrixで対象ユーザの行のindexを指定\n",
    "    item: rating_matrixでアイテムの列のindexを指定\n",
    "    similarity_score: 類似度の配列\n",
    "    rating_matrix: 評価値行列。行:ユーザ、列:アイテム\n",
    "    \"\"\"\n",
    "    users = len(rating_matrix)\n",
    "    weighted_rating = 0.0\n",
    "    similarity_sum = 0.0\n",
    "    for i in range(0, users):\n",
    "        # 自身について重み付け評価値は計算しない\n",
    "        if i == me:\n",
    "            continue\n",
    "        # 評価していないアイテムの場合は重み付け評価値を計算しない\n",
    "        if np.isnan(rating_matrix[i][item]):\n",
    "            continue\n",
    "        similarity_sum += similarity_score[i]\n",
    "        weighted_rating += similarity_score[i] * rating_matrix[i][item]\n",
    "    \n",
    "    print('重み付けスコア:', weighted_rating)\n",
    "    print('類似度の合計:', similarity_sum)\n",
    "    # 重み付け評価値を計算したユーザの人数によらないよう、計算したユーザの重みで割って正規化する\n",
    "    return round(weighted_rating / similarity_sum, 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "USER_D = 3\n",
    "ITEM3 = 2\n",
    "ITEM4 = 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "重み付けスコア: 3.05\n",
      "類似度の合計: 0.8699999999999999\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "3.51"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normalized_score(USER_D, ITEM3, user_similarity_mat[USER_D], mat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "重み付けスコア: 1.67\n",
      "類似度の合計: 0.37\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "4.51"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normalized_score(USER_D, ITEM4, user_similarity_mat[USER_D], mat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ユーザDに似たユーザとしてユーザEを追加したらどうなるか確認する。\n",
    "\n",
    "→ユーザEのアイテム4への評価値が不明では、相変わらずアイテム4がおすすめされた。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5.  3.  5. nan]\n",
      " [ 2.  5.  1.  5.]\n",
      " [ 1.  4.  2.  4.]\n",
      " [ 5.  2. nan nan]\n",
      " [ 4.  2.  4. nan]]\n"
     ]
    }
   ],
   "source": [
    "mat2 = np.array([\n",
    "    [5, 3, 5, np.nan],\n",
    "    [2, 5, 1, 5],\n",
    "    [1, 4, 2, 4],\n",
    "    [5, 2, np.nan, np.nan],\n",
    "    [4, 2, 4, np.nan]\n",
    "])\n",
    "print(mat2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1.   0.22 0.2  0.5  0.41]\n",
      " [0.22 1.   0.41 0.19 0.22]\n",
      " [0.2  0.41 1.   0.18 0.22]\n",
      " [0.5  0.19 0.18 1.   0.5 ]\n",
      " [0.41 0.22 0.22 0.5  1.  ]]\n"
     ]
    }
   ],
   "source": [
    "user_similarity_mat2 = similarity_score_matrix(mat2[:, :2])\n",
    "print(user_similarity_mat2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "重み付けスコア: 5.05\n",
      "類似度の合計: 1.3699999999999999\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "3.69"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normalized_score(USER_D, ITEM3, user_similarity_mat2[USER_D], mat2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "重み付けスコア: 1.67\n",
      "類似度の合計: 0.37\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "4.51"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normalized_score(USER_D, ITEM4, user_similarity_mat2[USER_D], mat2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ユーザAのアイテム4への評価が低い場合、レコメンドはどうなるか確認する。(ユーザEは考えない)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5.  3.  5.  1.]\n",
      " [ 2.  5.  1.  5.]\n",
      " [ 1.  4.  2.  4.]\n",
      " [ 5.  2. nan nan]]\n"
     ]
    }
   ],
   "source": [
    "mat3 = np.array([\n",
    "    [5, 3, 5, 1],\n",
    "    [2, 5, 1, 5],\n",
    "    [1, 4, 2, 4],\n",
    "    [5, 2, np.nan, np.nan]\n",
    "])\n",
    "print(mat3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# mat3の１，２列目はmatから変わっていないので、\n",
    "# similarity_score_matrix(mat3[:, :2])の出力はuser_similarity_matと変わらない。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "重み付けスコア: 3.05\n",
      "類似度の合計: 0.8699999999999999\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "3.51"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normalized_score(USER_D, ITEM3, user_similarity_mat[USER_D], mat3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "重み付けスコア: 2.17\n",
      "類似度の合計: 0.8699999999999999\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "2.49"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normalized_score(USER_D, ITEM4, user_similarity_mat[USER_D], mat3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5.  3.  5.  2.]\n",
      " [ 2.  5.  1.  5.]\n",
      " [ 1.  4.  2.  4.]\n",
      " [ 5.  2. nan nan]]\n"
     ]
    }
   ],
   "source": [
    "mat4 = np.array([\n",
    "    [5, 3, 5, 2],\n",
    "    [2, 5, 1, 5],\n",
    "    [1, 4, 2, 4],\n",
    "    [5, 2, np.nan, np.nan]\n",
    "])\n",
    "print(mat4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "重み付けスコア: 3.05\n",
      "類似度の合計: 0.8699999999999999\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "3.51"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normalized_score(USER_D, ITEM3, user_similarity_mat[USER_D], mat4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "重み付けスコア: 2.67\n",
      "類似度の合計: 0.8699999999999999\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "3.07"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normalized_score(USER_D, ITEM4, user_similarity_mat[USER_D], mat4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 協調フィルタリング（ユーザベース）の肝：\n",
    "# 類似したユーザの評価を重視して対象のユーザの評価を推測する"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

## 協調フィルタリングを手を動かして理解する.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              協調フィルタリングを手を動かして理解する.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"rettypy #8 @ftnext\n",
	"\n",
	"# やったこと\n",
	"\n",
	"協調フィルタリングの例をPythonで実装\n",
	"\n",
	"参考スライド: [協調フィルタリングを利用した推薦システム構築](https://www.slideshare.net/masayuki1986/recommendation-ml)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"import numpy as np"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### 評価値\n",
	"\n",
	"<table>\n",
	" <tr>\n",
	" <th></th>\n",
	" <th>アイテム1</th>\n",
	" <th>アイテム2</th>\n",
	" <th>アイテム3</th>\n",
	" <th>アイテム4</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <td>ユーザA</td>\n",
	" <td>5</td>\n",
	" <td>3</td>\n",
	" <td>5</td>\n",
	" <td>-</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <td>ユーザB</td>\n",
	" <td>2</td>\n",
	" <td>5</td>\n",
	" <td>1</td>\n",
	" <td>5</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <td>ユーザC</td>\n",
	" <td>1</td>\n",
	" <td>4</td>\n",
	" <td>2</td>\n",
	" <td>4</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <td>ユーザD</td>\n",
	" <td>5</td>\n",
	" <td>2</td>\n",
	" <td>-</td>\n",
	" <td>-</td>\n",
	" </tr>\n",
	"</table>\n",
	"\n",
	"例：ユーザAはアイテム1を☆5と評価した。-は評価していないことを意味する\n",
	"\n",
	"ユーザDがまだ評価していないアイテム3と4についてどちらをおすすめするか判断したい\n",
	"\n",
	"→アイテム1, 2の評価からユーザDに似ているユーザの評価を重視し、似ていないユーザの評価をあまり重視しないとする"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 5. 3. 5. nan]\n",
	" [ 2. 5. 1. 5.]\n",
	" [ 1. 4. 2. 4.]\n",
	" [ 5. 2. nan nan]]\n"
	]
	}
	],
	"source": [
	"mat = np.array([\n",
	" [5, 3, 5, np.nan],\n",
	" [2, 5, 1, 5],\n",
	" [1, 4, 2, 4],\n",
	" [5, 2, np.nan, np.nan]\n",
	"])\n",
	"print(mat)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"目次\n",
	"\n",
	"- ユーザDに似ているユーザを算出する\n",
	"- 似ているユーザの評価を使ってどれをおすすめするか判断する"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## ユーザDに似ているユーザを算出する\n",
	"\n",
	"ユーザごとの (アイテム1の評価, アイテム2の評価) の「距離」を計算する。\n",
	"\n",
	"ユーザAの場合は(5, 3), ユーザBの場合は(2, 5)となる。これを2次元の座標のように見て距離を計算する。\n",
	"\n",
	"距離が小さいユーザほど近いとなるように距離を0~1のスコアに変換する：$Score = \\frac{1}{1+距離}$"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"# 距離からなる行列を作成する\n",
	"def similarity_distance_matrix(rating_matrix):\n",
	" distance_rows = []\n",
	" for i in range(0, len(rating_matrix)):\n",
	" row = [round(np.linalg.norm(rating_matrix[i]-rating_matrix[j]), 2) for j in range(0, len(rating_matrix))]\n",
	" distance_rows.append(row)\n",
	" dist_mat = np.array(distance_rows)\n",
	" return dist_mat"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[5. 3.]\n",
	" [2. 5.]\n",
	" [1. 4.]\n",
	" [5. 2.]]\n"
	]
	}
	],
	"source": [
	"# 類似度行列の算出に使う列の取り出し\n",
	"print(mat[:, :2])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"A [5. 3.]\n",
	"B [2. 5.]\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"3.605551275463989"
	]
	},
	"execution_count": 5,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# 距離の算出（ユーザAとBの場合）\n",
	"print('A', mat[:, :2][0])\n",
	"print('B', mat[:, :2][1])\n",
	"np.linalg.norm(mat[:, :2][0]-mat[:, :2][1])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[0. 3.61 4.12 1. ]\n",
	" [3.61 0. 1.41 4.24]\n",
	" [4.12 1.41 0. 4.47]\n",
	" [1. 4.24 4.47 0. ]]\n"
	]
	}
	],
	"source": [
	"# アイテム1, 2の評価をもとにユーザ間の距離の行列を作成する\n",
	"user_distance_mat = similarity_distance_matrix(mat[:, :2])\n",
	"print(user_distance_mat)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [],
	"source": [
	"# 類似度からなる行列を作成する\n",
	"def similarity_score_matrix(rating_matrix):\n",
	" score_rows = []\n",
	" for i in range(0, len(rating_matrix)):\n",
	" row = []\n",
	" for j in range(0, len(rating_matrix)):\n",
	" distance = np.linalg.norm(rating_matrix[i]-rating_matrix[j])\n",
	" score = 1 / (1 + round(distance, 2))\n",
	" row.append(round(score, 2))\n",
	" score_rows.append(row)\n",
	" score_mat = np.array(score_rows)\n",
	" return score_mat"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[1. 0.22 0.2 0.5 ]\n",
	" [0.22 1. 0.41 0.19]\n",
	" [0.2 0.41 1. 0.18]\n",
	" [0.5 0.19 0.18 1. ]]\n"
	]
	}
	],
	"source": [
	"user_similarity_mat = similarity_score_matrix(mat[:, :2])\n",
	"print(user_similarity_mat)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 似ているユーザの評価を使ってどれをおすすめするか判断する\n",
	"\n",
	"ユーザA, B, CがそれぞれユーザDにどれほど似ているかはわかっている。\n",
	"\n",
	"ユーザA, B, Cのアイテム3, 4への評価値を使って、ユーザDにどちらをおすすめするか考える"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {},
	"outputs": [],
	"source": [
	"def normalized_score(me, item, similarity_score, rating_matrix):\n",
	" \"\"\"ユーザmeについてitemへの正規化済みの重み付け評価値を算出する\n",
	" \n",
	" me: 正規化済み重み付けスコア算出対象。rating_matrixで対象ユーザの行のindexを指定\n",
	" item: rating_matrixでアイテムの列のindexを指定\n",
	" similarity_score: 類似度の配列\n",
	" rating_matrix: 評価値行列。行:ユーザ、列:アイテム\n",
	" \"\"\"\n",
	" users = len(rating_matrix)\n",
	" weighted_rating = 0.0\n",
	" similarity_sum = 0.0\n",
	" for i in range(0, users):\n",
	" # 自身について重み付け評価値は計算しない\n",
	" if i == me:\n",
	" continue\n",
	" # 評価していないアイテムの場合は重み付け評価値を計算しない\n",
	" if np.isnan(rating_matrix[i][item]):\n",
	" continue\n",
	" similarity_sum += similarity_score[i]\n",
	" weighted_rating += similarity_score[i] * rating_matrix[i][item]\n",
	" \n",
	" print('重み付けスコア:', weighted_rating)\n",
	" print('類似度の合計:', similarity_sum)\n",
	" # 重み付け評価値を計算したユーザの人数によらないよう、計算したユーザの重みで割って正規化する\n",
	" return round(weighted_rating / similarity_sum, 2)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {},
	"outputs": [],
	"source": [
	"USER_D = 3\n",
	"ITEM3 = 2\n",
	"ITEM4 = 3"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"重み付けスコア: 3.05\n",
	"類似度の合計: 0.8699999999999999\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"3.51"
	]
	},
	"execution_count": 11,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"normalized_score(USER_D, ITEM3, user_similarity_mat[USER_D], mat)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"重み付けスコア: 1.67\n",
	"類似度の合計: 0.37\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"4.51"
	]
	},
	"execution_count": 12,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"normalized_score(USER_D, ITEM4, user_similarity_mat[USER_D], mat)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"ユーザDに似たユーザとしてユーザEを追加したらどうなるか確認する。\n",
	"\n",
	"→ユーザEのアイテム4への評価値が不明では、相変わらずアイテム4がおすすめされた。"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 5. 3. 5. nan]\n",
	" [ 2. 5. 1. 5.]\n",
	" [ 1. 4. 2. 4.]\n",
	" [ 5. 2. nan nan]\n",
	" [ 4. 2. 4. nan]]\n"
	]
	}
	],
	"source": [
	"mat2 = np.array([\n",
	" [5, 3, 5, np.nan],\n",
	" [2, 5, 1, 5],\n",
	" [1, 4, 2, 4],\n",
	" [5, 2, np.nan, np.nan],\n",
	" [4, 2, 4, np.nan]\n",
	"])\n",
	"print(mat2)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[1. 0.22 0.2 0.5 0.41]\n",
	" [0.22 1. 0.41 0.19 0.22]\n",
	" [0.2 0.41 1. 0.18 0.22]\n",
	" [0.5 0.19 0.18 1. 0.5 ]\n",
	" [0.41 0.22 0.22 0.5 1. ]]\n"
	]
	}
	],
	"source": [
	"user_similarity_mat2 = similarity_score_matrix(mat2[:, :2])\n",
	"print(user_similarity_mat2)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"重み付けスコア: 5.05\n",
	"類似度の合計: 1.3699999999999999\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"3.69"
	]
	},
	"execution_count": 15,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"normalized_score(USER_D, ITEM3, user_similarity_mat2[USER_D], mat2)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"重み付けスコア: 1.67\n",
	"類似度の合計: 0.37\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"4.51"
	]
	},
	"execution_count": 16,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"normalized_score(USER_D, ITEM4, user_similarity_mat2[USER_D], mat2)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"ユーザAのアイテム4への評価が低い場合、レコメンドはどうなるか確認する。(ユーザEは考えない)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 17,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 5. 3. 5. 1.]\n",
	" [ 2. 5. 1. 5.]\n",
	" [ 1. 4. 2. 4.]\n",
	" [ 5. 2. nan nan]]\n"
	]
	}
	],
	"source": [
	"mat3 = np.array([\n",
	" [5, 3, 5, 1],\n",
	" [2, 5, 1, 5],\n",
	" [1, 4, 2, 4],\n",
	" [5, 2, np.nan, np.nan]\n",
	"])\n",
	"print(mat3)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 18,
	"metadata": {},
	"outputs": [],
	"source": [
	"# mat3の１，２列目はmatから変わっていないので、\n",
	"# similarity_score_matrix(mat3[:, :2])の出力はuser_similarity_matと変わらない。"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 19,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"重み付けスコア: 3.05\n",
	"類似度の合計: 0.8699999999999999\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"3.51"
	]
	},
	"execution_count": 19,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"normalized_score(USER_D, ITEM3, user_similarity_mat[USER_D], mat3)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 20,
	"metadata": {
	"scrolled": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"重み付けスコア: 2.17\n",
	"類似度の合計: 0.8699999999999999\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"2.49"
	]
	},
	"execution_count": 20,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"normalized_score(USER_D, ITEM4, user_similarity_mat[USER_D], mat3)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 21,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 5. 3. 5. 2.]\n",
	" [ 2. 5. 1. 5.]\n",
	" [ 1. 4. 2. 4.]\n",
	" [ 5. 2. nan nan]]\n"
	]
	}
	],
	"source": [
	"mat4 = np.array([\n",
	" [5, 3, 5, 2],\n",
	" [2, 5, 1, 5],\n",
	" [1, 4, 2, 4],\n",
	" [5, 2, np.nan, np.nan]\n",
	"])\n",
	"print(mat4)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 22,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"重み付けスコア: 3.05\n",
	"類似度の合計: 0.8699999999999999\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"3.51"
	]
	},
	"execution_count": 22,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"normalized_score(USER_D, ITEM3, user_similarity_mat[USER_D], mat4)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 23,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"重み付けスコア: 2.67\n",
	"類似度の合計: 0.8699999999999999\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"3.07"
	]
	},
	"execution_count": 23,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"normalized_score(USER_D, ITEM4, user_similarity_mat[USER_D], mat4)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# 協調フィルタリング（ユーザベース）の肝：\n",
	"# 類似したユーザの評価を重視して対象のユーザの評価を推測する"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.5.1"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}