Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ftnext/90147ab41a81a1d9f10c28d2487e9196 to your computer and use it in GitHub Desktop.
Save ftnext/90147ab41a81a1d9f10c28d2487e9196 to your computer and use it in GitHub Desktop.
2018/03/10 Pythonもくもく自習室 #8 @ Rettyオフィス 成果発表:ユーザベースの協調フィルタリングの例を実装
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"rettypy #8 @ftnext\n",
"\n",
"# やったこと\n",
"\n",
"協調フィルタリングの例をPythonで実装\n",
"\n",
"参考スライド: [協調フィルタリングを利用した推薦システム構築](https://www.slideshare.net/masayuki1986/recommendation-ml)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 評価値\n",
"\n",
"<table>\n",
" <tr>\n",
" <th></th>\n",
" <th>ユーザA</th>\n",
" <th>ユーザB</th>\n",
" <th>ユーザC</th>\n",
" </tr>\n",
" <tr>\n",
" <td>アイテム1</td>\n",
" <td>5</td>\n",
" <td>2</td>\n",
" <td>4.5</td>\n",
" </tr>\n",
" <tr>\n",
" <td>アイテム2</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>-</td>\n",
" </tr>\n",
" <tr>\n",
" <td>アイテム3</td>\n",
" <td>4</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <td>アイテム4</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>-</td>\n",
" </tr>\n",
" <tr>\n",
" <td>アイテム5</td>\n",
" <td>2</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>アイテム6</td>\n",
" <td>2</td>\n",
" <td>4</td>\n",
" <td>-</td>\n",
" </tr>\n",
"</table>\n",
"\n",
"例:ユーザAはアイテム1を☆5と評価した。-は評価していないことを意味する\n",
"\n",
"ユーザCがまだ評価していないアイテム2と4についてどちらをおすすめするか判断したい\n",
"\n",
"→ユーザCについて、アイテム2,4に似ているアイテムへの評価を重視し、似ていないアイテムへの評価をあまり重視しないとする"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[5. 2. 4.5]\n",
" [5. 1. nan]\n",
" [4. 3. 4. ]\n",
" [4. 4. nan]\n",
" [2. 5. 1. ]\n",
" [2. 4. nan]]\n"
]
}
],
"source": [
"mat = np.array([\n",
" [5, 2, 4.5],\n",
" [5, 1, np.nan],\n",
" [4, 3, 4],\n",
" [4, 4, np.nan],\n",
" [2, 5, 1],\n",
" [2, 4, np.nan]\n",
"])\n",
"print(mat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"目次\n",
"\n",
"- アイテム間の類似度を算出する\n",
"- 似ているユーザの評価を使ってどれをおすすめするか判断する"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## アイテム間の類似度を算出する\n",
"\n",
"アイテムごとの (ユーザAの評価, ユーザBの評価) の「距離」を計算する。\n",
"\n",
"アイテム1の場合は(5, 2), アイテム2の場合は(5, 1)となる。これを2次元の座標のように見て距離を計算する。\n",
"\n",
"距離が小さいユーザほど近いとなるように距離を0~1のスコアに変換する:$Score = \\frac{1}{1+距離}$"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"# 距離からなる行列を作成する\n",
"def similarity_distance_matrix(rating_matrix):\n",
" distance_rows = []\n",
" for i in range(0, len(rating_matrix)):\n",
" row = [round(np.linalg.norm(rating_matrix[i]-rating_matrix[j]), 2) for j in range(0, len(rating_matrix))]\n",
" distance_rows.append(row)\n",
" dist_mat = np.array(distance_rows)\n",
" return dist_mat"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[5. 2.]\n",
" [5. 1.]\n",
" [4. 3.]\n",
" [4. 4.]\n",
" [2. 5.]\n",
" [2. 4.]]\n"
]
}
],
"source": [
"# 類似度行列の算出に使う列の取り出し\n",
"print(mat[:, :2])"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 [5. 2.]\n",
"2 [5. 1.]\n"
]
},
{
"data": {
"text/plain": [
"1.0"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 距離の算出(ユーザAとBの場合)\n",
"print('1', mat[:, :2][0])\n",
"print('2', mat[:, :2][1])\n",
"np.linalg.norm(mat[:, :2][0]-mat[:, :2][1])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0. 1. 1.41 2.24 4.24 3.61]\n",
" [1. 0. 2.24 3.16 5. 4.24]\n",
" [1.41 2.24 0. 1. 2.83 2.24]\n",
" [2.24 3.16 1. 0. 2.24 2. ]\n",
" [4.24 5. 2.83 2.24 0. 1. ]\n",
" [3.61 4.24 2.24 2. 1. 0. ]]\n"
]
}
],
"source": [
"# アイテム1, 2の評価をもとにユーザ間の距離の行列を作成する\n",
"item_distance_mat = similarity_distance_matrix(mat[:, :2])\n",
"print(item_distance_mat)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"# 類似度からなる行列を作成する\n",
"def similarity_score_matrix(rating_matrix):\n",
" score_rows = []\n",
" for i in range(0, len(rating_matrix)):\n",
" row = []\n",
" for j in range(0, len(rating_matrix)):\n",
" distance = np.linalg.norm(rating_matrix[i]-rating_matrix[j])\n",
" score = 1 / (1 + round(distance, 2))\n",
" row.append(round(score, 2))\n",
" score_rows.append(row)\n",
" score_mat = np.array(score_rows)\n",
" return score_mat"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1. 0.5 0.41 0.31 0.19 0.22]\n",
" [0.5 1. 0.31 0.24 0.17 0.19]\n",
" [0.41 0.31 1. 0.5 0.26 0.31]\n",
" [0.31 0.24 0.5 1. 0.31 0.33]\n",
" [0.19 0.17 0.26 0.31 1. 0.5 ]\n",
" [0.22 0.19 0.31 0.33 0.5 1. ]]\n"
]
}
],
"source": [
"item_similarity_mat = similarity_score_matrix(mat[:, :2])\n",
"print(item_similarity_mat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## アイテムの類似度を使ってユーザCにどれをおすすめするか判断する\n",
"\n",
"ユーザCはアイテム1, 3, 5を評価している。(アイテム2, 4, 6については評価していない)\n",
"\n",
"アイテム2, 4がそれぞれアイテム1, 3, 5にどれほど似ているかはわかっている。\n",
"\n",
"ユーザCのアイテム1, 3, 5への評価値とそれぞれがアイテム2, 4にどれだけ似ているかを使って、ユーザCにどちらをおすすめするか考える"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"ITEM2_similarity = item_similarity_mat[1]\n",
"ITEM4_similatiry = item_similarity_mat[3]"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 2.25\n",
"2 1.24\n",
"4 0.17\n"
]
}
],
"source": [
"for i in range(0, len(mat)):\n",
" if np.isnan(mat[i][2]):\n",
" continue\n",
" print(i, ITEM2_similarity[i] * mat[i][2])"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"def normalized_score_for_item(item, me, similarity_score, rating_matrix):\n",
" \"\"\"itemについてユーザmeの正規化済みの重み付け評価値を算出する\n",
" \n",
" item: 正規化済み重み付けスコア算出対象。rating_matrixでアイテムの列のindexを指定\n",
" me: rating_matrixで対象ユーザの行のindexを指定\n",
" similarity_score: 類似度の配列\n",
" rating_matrix: 評価値行列。行:ユーザ、列:アイテム\n",
" \"\"\"\n",
" items = len(rating_matrix)\n",
" weighted_rating = 0.0\n",
" similarity_sum = 0.0\n",
" for i in range(0, items):\n",
" # アイテム自身について重み付け評価値は計算しない\n",
" if i == item:\n",
" continue\n",
" # ユーザmeが評価していないアイテムの場合は重み付け評価値を計算しない\n",
" if np.isnan(rating_matrix[i][me]):\n",
" continue\n",
" similarity_sum += similarity_score[i]\n",
" weighted_rating += similarity_score[i] * rating_matrix[i][me]\n",
" \n",
" print('重み付けスコア:', weighted_rating)\n",
" print('類似度の合計:', similarity_sum)\n",
" # 重み付け評価値を計算したユーザの人数によらないよう、計算したユーザの重みで割って正規化する\n",
" return round(weighted_rating / similarity_sum, 2)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"USER_C = 2\n",
"ITEM2 = 1\n",
"ITEM4 = 3"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"重み付けスコア: 3.66\n",
"類似度の合計: 0.9800000000000001\n"
]
},
{
"data": {
"text/plain": [
"3.73"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normalized_score_for_item(ITEM2, USER_C, item_similarity_mat[ITEM2], mat)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"重み付けスコア: 3.705\n",
"類似度の合計: 1.12\n"
]
},
{
"data": {
"text/plain": [
"3.31"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normalized_score_for_item(ITEM4, USER_C, item_similarity_mat[ITEM4], mat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 協調フィルタリング(アイテムベース)の肝:\n",
"# 対象のユーザが類似したアイテムにしている評価を重視して<br>対象のユーザの評価を推測する"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ftnext
Copy link
Author

ftnext commented Jul 28, 2018

このGistは「ユーザベースの協調フィルタリング」からなっていたが、関連するnotebookが見つかったため追加でアップロードした(Revision2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment