Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ftnext/90147ab41a81a1d9f10c28d2487e9196 to your computer and use it in GitHub Desktop.
Save ftnext/90147ab41a81a1d9f10c28d2487e9196 to your computer and use it in GitHub Desktop.
2018/03/10 Pythonもくもく自習室 #8 @ Rettyオフィス 成果発表:ユーザベースの協調フィルタリングの例を実装
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"rettypy #8 @ftnext\n",
"\n",
"# やったこと\n",
"\n",
"協調フィルタリングの例をPythonで実装\n",
"\n",
"参考スライド: [協調フィルタリングを利用した推薦システム構築](https://www.slideshare.net/masayuki1986/recommendation-ml)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 評価値\n",
"\n",
"<table>\n",
" <tr>\n",
" <th></th>\n",
" <th>アイテム1</th>\n",
" <th>アイテム2</th>\n",
" <th>アイテム3</th>\n",
" <th>アイテム4</th>\n",
" </tr>\n",
" <tr>\n",
" <td>ユーザA</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>5</td>\n",
" <td>-</td>\n",
" </tr>\n",
" <tr>\n",
" <td>ユーザB</td>\n",
" <td>2</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <td>ユーザC</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>2</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <td>ユーザD</td>\n",
" <td>5</td>\n",
" <td>2</td>\n",
" <td>-</td>\n",
" <td>-</td>\n",
" </tr>\n",
"</table>\n",
"\n",
"例:ユーザAはアイテム1を☆5と評価した。-は評価していないことを意味する\n",
"\n",
"ユーザDがまだ評価していないアイテム3と4についてどちらをおすすめするか判断したい\n",
"\n",
"→アイテム1, 2の評価からユーザDに似ているユーザの評価を重視し、似ていないユーザの評価をあまり重視しないとする"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 5. 3. 5. nan]\n",
" [ 2. 5. 1. 5.]\n",
" [ 1. 4. 2. 4.]\n",
" [ 5. 2. nan nan]]\n"
]
}
],
"source": [
"mat = np.array([\n",
" [5, 3, 5, np.nan],\n",
" [2, 5, 1, 5],\n",
" [1, 4, 2, 4],\n",
" [5, 2, np.nan, np.nan]\n",
"])\n",
"print(mat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"目次\n",
"\n",
"- ユーザDに似ているユーザを算出する\n",
"- 似ているユーザの評価を使ってどれをおすすめするか判断する"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ユーザDに似ているユーザを算出する\n",
"\n",
"ユーザごとの (アイテム1の評価, アイテム2の評価) の「距離」を計算する。\n",
"\n",
"ユーザAの場合は(5, 3), ユーザBの場合は(2, 5)となる。これを2次元の座標のように見て距離を計算する。\n",
"\n",
"距離が小さいユーザほど近いとなるように距離を0~1のスコアに変換する:$Score = \\frac{1}{1+距離}$"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# 距離からなる行列を作成する\n",
"def similarity_distance_matrix(rating_matrix):\n",
" distance_rows = []\n",
" for i in range(0, len(rating_matrix)):\n",
" row = [round(np.linalg.norm(rating_matrix[i]-rating_matrix[j]), 2) for j in range(0, len(rating_matrix))]\n",
" distance_rows.append(row)\n",
" dist_mat = np.array(distance_rows)\n",
" return dist_mat"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[5. 3.]\n",
" [2. 5.]\n",
" [1. 4.]\n",
" [5. 2.]]\n"
]
}
],
"source": [
"# 類似度行列の算出に使う列の取り出し\n",
"print(mat[:, :2])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"A [5. 3.]\n",
"B [2. 5.]\n"
]
},
{
"data": {
"text/plain": [
"3.605551275463989"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 距離の算出(ユーザAとBの場合)\n",
"print('A', mat[:, :2][0])\n",
"print('B', mat[:, :2][1])\n",
"np.linalg.norm(mat[:, :2][0]-mat[:, :2][1])"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0. 3.61 4.12 1. ]\n",
" [3.61 0. 1.41 4.24]\n",
" [4.12 1.41 0. 4.47]\n",
" [1. 4.24 4.47 0. ]]\n"
]
}
],
"source": [
"# アイテム1, 2の評価をもとにユーザ間の距離の行列を作成する\n",
"user_distance_mat = similarity_distance_matrix(mat[:, :2])\n",
"print(user_distance_mat)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# 類似度からなる行列を作成する\n",
"def similarity_score_matrix(rating_matrix):\n",
" score_rows = []\n",
" for i in range(0, len(rating_matrix)):\n",
" row = []\n",
" for j in range(0, len(rating_matrix)):\n",
" distance = np.linalg.norm(rating_matrix[i]-rating_matrix[j])\n",
" score = 1 / (1 + round(distance, 2))\n",
" row.append(round(score, 2))\n",
" score_rows.append(row)\n",
" score_mat = np.array(score_rows)\n",
" return score_mat"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1. 0.22 0.2 0.5 ]\n",
" [0.22 1. 0.41 0.19]\n",
" [0.2 0.41 1. 0.18]\n",
" [0.5 0.19 0.18 1. ]]\n"
]
}
],
"source": [
"user_similarity_mat = similarity_score_matrix(mat[:, :2])\n",
"print(user_similarity_mat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 似ているユーザの評価を使ってどれをおすすめするか判断する\n",
"\n",
"ユーザA, B, CがそれぞれユーザDにどれほど似ているかはわかっている。\n",
"\n",
"ユーザA, B, Cのアイテム3, 4への評価値を使って、ユーザDにどちらをおすすめするか考える"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def normalized_score(me, item, similarity_score, rating_matrix):\n",
" \"\"\"ユーザmeについてitemへの正規化済みの重み付け評価値を算出する\n",
" \n",
" me: 正規化済み重み付けスコア算出対象。rating_matrixで対象ユーザの行のindexを指定\n",
" item: rating_matrixでアイテムの列のindexを指定\n",
" similarity_score: 類似度の配列\n",
" rating_matrix: 評価値行列。行:ユーザ、列:アイテム\n",
" \"\"\"\n",
" users = len(rating_matrix)\n",
" weighted_rating = 0.0\n",
" similarity_sum = 0.0\n",
" for i in range(0, users):\n",
" # 自身について重み付け評価値は計算しない\n",
" if i == me:\n",
" continue\n",
" # 評価していないアイテムの場合は重み付け評価値を計算しない\n",
" if np.isnan(rating_matrix[i][item]):\n",
" continue\n",
" similarity_sum += similarity_score[i]\n",
" weighted_rating += similarity_score[i] * rating_matrix[i][item]\n",
" \n",
" print('重み付けスコア:', weighted_rating)\n",
" print('類似度の合計:', similarity_sum)\n",
" # 重み付け評価値を計算したユーザの人数によらないよう、計算したユーザの重みで割って正規化する\n",
" return round(weighted_rating / similarity_sum, 2)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"USER_D = 3\n",
"ITEM3 = 2\n",
"ITEM4 = 3"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"重み付けスコア: 3.05\n",
"類似度の合計: 0.8699999999999999\n"
]
},
{
"data": {
"text/plain": [
"3.51"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normalized_score(USER_D, ITEM3, user_similarity_mat[USER_D], mat)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"重み付けスコア: 1.67\n",
"類似度の合計: 0.37\n"
]
},
{
"data": {
"text/plain": [
"4.51"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normalized_score(USER_D, ITEM4, user_similarity_mat[USER_D], mat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ユーザDに似たユーザとしてユーザEを追加したらどうなるか確認する。\n",
"\n",
"→ユーザEのアイテム4への評価値が不明では、相変わらずアイテム4がおすすめされた。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 5. 3. 5. nan]\n",
" [ 2. 5. 1. 5.]\n",
" [ 1. 4. 2. 4.]\n",
" [ 5. 2. nan nan]\n",
" [ 4. 2. 4. nan]]\n"
]
}
],
"source": [
"mat2 = np.array([\n",
" [5, 3, 5, np.nan],\n",
" [2, 5, 1, 5],\n",
" [1, 4, 2, 4],\n",
" [5, 2, np.nan, np.nan],\n",
" [4, 2, 4, np.nan]\n",
"])\n",
"print(mat2)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1. 0.22 0.2 0.5 0.41]\n",
" [0.22 1. 0.41 0.19 0.22]\n",
" [0.2 0.41 1. 0.18 0.22]\n",
" [0.5 0.19 0.18 1. 0.5 ]\n",
" [0.41 0.22 0.22 0.5 1. ]]\n"
]
}
],
"source": [
"user_similarity_mat2 = similarity_score_matrix(mat2[:, :2])\n",
"print(user_similarity_mat2)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"重み付けスコア: 5.05\n",
"類似度の合計: 1.3699999999999999\n"
]
},
{
"data": {
"text/plain": [
"3.69"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normalized_score(USER_D, ITEM3, user_similarity_mat2[USER_D], mat2)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"重み付けスコア: 1.67\n",
"類似度の合計: 0.37\n"
]
},
{
"data": {
"text/plain": [
"4.51"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normalized_score(USER_D, ITEM4, user_similarity_mat2[USER_D], mat2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ユーザAのアイテム4への評価が低い場合、レコメンドはどうなるか確認する。(ユーザEは考えない)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 5. 3. 5. 1.]\n",
" [ 2. 5. 1. 5.]\n",
" [ 1. 4. 2. 4.]\n",
" [ 5. 2. nan nan]]\n"
]
}
],
"source": [
"mat3 = np.array([\n",
" [5, 3, 5, 1],\n",
" [2, 5, 1, 5],\n",
" [1, 4, 2, 4],\n",
" [5, 2, np.nan, np.nan]\n",
"])\n",
"print(mat3)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# mat3の1,2列目はmatから変わっていないので、\n",
"# similarity_score_matrix(mat3[:, :2])の出力はuser_similarity_matと変わらない。"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"重み付けスコア: 3.05\n",
"類似度の合計: 0.8699999999999999\n"
]
},
{
"data": {
"text/plain": [
"3.51"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normalized_score(USER_D, ITEM3, user_similarity_mat[USER_D], mat3)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"重み付けスコア: 2.17\n",
"類似度の合計: 0.8699999999999999\n"
]
},
{
"data": {
"text/plain": [
"2.49"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normalized_score(USER_D, ITEM4, user_similarity_mat[USER_D], mat3)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 5. 3. 5. 2.]\n",
" [ 2. 5. 1. 5.]\n",
" [ 1. 4. 2. 4.]\n",
" [ 5. 2. nan nan]]\n"
]
}
],
"source": [
"mat4 = np.array([\n",
" [5, 3, 5, 2],\n",
" [2, 5, 1, 5],\n",
" [1, 4, 2, 4],\n",
" [5, 2, np.nan, np.nan]\n",
"])\n",
"print(mat4)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"重み付けスコア: 3.05\n",
"類似度の合計: 0.8699999999999999\n"
]
},
{
"data": {
"text/plain": [
"3.51"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normalized_score(USER_D, ITEM3, user_similarity_mat[USER_D], mat4)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"重み付けスコア: 2.67\n",
"類似度の合計: 0.8699999999999999\n"
]
},
{
"data": {
"text/plain": [
"3.07"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normalized_score(USER_D, ITEM4, user_similarity_mat[USER_D], mat4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 協調フィルタリング(ユーザベース)の肝:\n",
"# 類似したユーザの評価を重視して対象のユーザの評価を推測する"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ftnext
Copy link
Author

ftnext commented Jul 28, 2018

このGistは「ユーザベースの協調フィルタリング」からなっていたが、関連するnotebookが見つかったため追加でアップロードした(Revision2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment