Skip to content

Instantly share code, notes, and snippets.

@mdekstrand
Created August 28, 2019 18:42
Show Gist options
  • Save mdekstrand/19b609c3f53cf3a2c2deb41ba9eaf50f to your computer and use it in GitHub Desktop.
Save mdekstrand/19b609c3f53cf3a2c2deb41ba9eaf50f to your computer and use it in GitHub Desktop.
LensKit demo
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# LensKit Demo Notebook"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Make sure we have LensKit:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting package metadata (repodata.json): ...working... done\n",
"Solving environment: ...working... done\n",
"\n",
"## Package Plan ##\n",
"\n",
" environment location: E:\\Anaconda\\envs\\lkplay\n",
"\n",
" added / updated specs:\n",
" - lenskit\n",
"\n",
"\n",
"The following packages will be SUPERSEDED by a higher-priority channel:\n",
"\n",
" lenskit mdekstrand/label/dev::lenskit-0.8.0.d~ --> lenskit::lenskit-0.7.0-py37h1aa3f02_0\n",
"\n",
"\n",
"Preparing transaction: ...working... done\n",
"Verifying transaction: ...working... done\n",
"Executing transaction: ...working... done\n",
"\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"\n",
"==> WARNING: A newer version of conda exists. <==\n",
" current version: 4.7.10\n",
" latest version: 4.7.11\n",
"\n",
"Please update conda by running\n",
"\n",
" $ conda update -n base -c defaults conda\n",
"\n",
"\n"
]
}
],
"source": [
"%conda install -c lenskit lenskit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import LensKit algorithms and data utilities:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"import lenskit.datasets as ds\n",
"from lenskit.algorithms import Recommender\n",
"from lenskit.algorithms.item_knn import ItemItem"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading and Fitting the Model\n",
"\n",
"LensKit provides utilities for loading the rating data and transforming it into the expected format. This works for current data sets; the `ML100K` class supports the 100K data set."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user</th>\n",
" <th>item</th>\n",
" <th>rating</th>\n",
" <th>timestamp</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4.0</td>\n",
" <td>964982703</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>4.0</td>\n",
" <td>964981247</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" <td>4.0</td>\n",
" <td>964982224</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>47</td>\n",
" <td>5.0</td>\n",
" <td>964983815</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>50</td>\n",
" <td>5.0</td>\n",
" <td>964982931</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user item rating timestamp\n",
"0 1 1 4.0 964982703\n",
"1 1 3 4.0 964981247\n",
"2 1 6 4.0 964982224\n",
"3 1 47 5.0 964983815\n",
"4 1 50 5.0 964982931"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = ds.MovieLens('ml-latest-small')\n",
"data.ratings.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize an item-item collaborative filter with sensible defaults:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"iicf = ItemItem(20, min_nbrs=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Not all algorithms implement the `Recommender` interface. The `adapt` class method converts an algorithm into a recommender, wrapping it in a top-*N* recommender that exlcudes rated items if it is not a recommender."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'TopN/ItemItem(nnbrs=20, msize=None)'"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"algo = Recommender.adapt(iicf)\n",
"str(algo)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can fit the algorithm to our rating data:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<lenskit.algorithms.basic.TopN at 0x21331834788>"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"algo.fit(data.ratings)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generating Recommendations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can generate recomendations for users in the data set:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>item</th>\n",
" <th>score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>685</td>\n",
" <td>5.030551</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1105</td>\n",
" <td>5.010529</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>238</td>\n",
" <td>4.996500</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>779</td>\n",
" <td>4.973631</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>1564</td>\n",
" <td>4.922808</td>\n",
" </tr>\n",
" <tr>\n",
" <td>5</td>\n",
" <td>2203</td>\n",
" <td>4.744812</td>\n",
" </tr>\n",
" <tr>\n",
" <td>6</td>\n",
" <td>808</td>\n",
" <td>4.739143</td>\n",
" </tr>\n",
" <tr>\n",
" <td>7</td>\n",
" <td>670</td>\n",
" <td>4.633967</td>\n",
" </tr>\n",
" <tr>\n",
" <td>8</td>\n",
" <td>1041</td>\n",
" <td>4.629044</td>\n",
" </tr>\n",
" <tr>\n",
" <td>9</td>\n",
" <td>3673</td>\n",
" <td>4.584904</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" item score\n",
"0 685 5.030551\n",
"1 1105 5.010529\n",
"2 238 4.996500\n",
"3 779 4.973631\n",
"4 1564 4.922808\n",
"5 2203 4.744812\n",
"6 808 4.739143\n",
"7 670 4.633967\n",
"8 1041 4.629044\n",
"9 3673 4.584904"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"algo.recommend(22, 10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ooh, they're going to like movie 685! What on earth is that? Let's make these results more useful - the data set object gives us movie titles."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>item</th>\n",
" <th>score</th>\n",
" <th>title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>685</td>\n",
" <td>5.030551</td>\n",
" <td>It's My Party (1996)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1105</td>\n",
" <td>5.010529</td>\n",
" <td>Children of the Corn IV: The Gathering (1996)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>238</td>\n",
" <td>4.996500</td>\n",
" <td>Far From Home: The Adventures of Yellow Dog (1...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>779</td>\n",
" <td>4.973631</td>\n",
" <td>'Til There Was You (1997)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>1564</td>\n",
" <td>4.922808</td>\n",
" <td>For Roseanna (Roseanna's Grave) (1997)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>5</td>\n",
" <td>2203</td>\n",
" <td>4.744812</td>\n",
" <td>Shadow of a Doubt (1943)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>6</td>\n",
" <td>808</td>\n",
" <td>4.739143</td>\n",
" <td>Alaska (1996)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>7</td>\n",
" <td>670</td>\n",
" <td>4.633967</td>\n",
" <td>World of Apu, The (Apur Sansar) (1959)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>8</td>\n",
" <td>1041</td>\n",
" <td>4.629044</td>\n",
" <td>Secrets &amp; Lies (1996)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>9</td>\n",
" <td>3673</td>\n",
" <td>4.584904</td>\n",
" <td>Benji the Hunted (1987)</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" item score title\n",
"0 685 5.030551 It's My Party (1996)\n",
"1 1105 5.010529 Children of the Corn IV: The Gathering (1996)\n",
"2 238 4.996500 Far From Home: The Adventures of Yellow Dog (1...\n",
"3 779 4.973631 'Til There Was You (1997)\n",
"4 1564 4.922808 For Roseanna (Roseanna's Grave) (1997)\n",
"5 2203 4.744812 Shadow of a Doubt (1943)\n",
"6 808 4.739143 Alaska (1996)\n",
"7 670 4.633967 World of Apu, The (Apur Sansar) (1959)\n",
"8 1041 4.629044 Secrets & Lies (1996)\n",
"9 3673 4.584904 Benji the Hunted (1987)"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"recs_22 = algo.recommend(22, 10)\n",
"recs_22.join(data.movies['title'], on='item')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's recommend for a new, fresh user. We need to set up some ratings for them - how about Toy Story (1) and The Iron Giant (2761)?"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 5.0\n",
"2761 4.0\n",
"dtype: float64"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_rates = pd.Series({1: 5.0, 2761: 4.0})\n",
"new_rates"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use these ratings to recommend:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>item</th>\n",
" <th>score</th>\n",
" <th>title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>106642</td>\n",
" <td>5.597255</td>\n",
" <td>Day of the Doctor, The (2013)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>6345</td>\n",
" <td>5.468917</td>\n",
" <td>Chorus Line, A (1985)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>33779</td>\n",
" <td>5.451872</td>\n",
" <td>Eddie Izzard: Dress to Kill (1999)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>6591</td>\n",
" <td>5.358906</td>\n",
" <td>Magdalene Sisters, The (2002)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>9018</td>\n",
" <td>5.354345</td>\n",
" <td>Control Room (2004)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>5</td>\n",
" <td>7614</td>\n",
" <td>5.350405</td>\n",
" <td>Oklahoma! (1955)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>6</td>\n",
" <td>1041</td>\n",
" <td>5.320048</td>\n",
" <td>Secrets &amp; Lies (1996)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>7</td>\n",
" <td>3302</td>\n",
" <td>5.281882</td>\n",
" <td>Beautiful People (1999)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>8</td>\n",
" <td>2959</td>\n",
" <td>5.239546</td>\n",
" <td>Fight Club (1999)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>9</td>\n",
" <td>7983</td>\n",
" <td>5.219090</td>\n",
" <td>Broadway Danny Rose (1984)</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" item score title\n",
"0 106642 5.597255 Day of the Doctor, The (2013)\n",
"1 6345 5.468917 Chorus Line, A (1985)\n",
"2 33779 5.451872 Eddie Izzard: Dress to Kill (1999)\n",
"3 6591 5.358906 Magdalene Sisters, The (2002)\n",
"4 9018 5.354345 Control Room (2004)\n",
"5 7614 5.350405 Oklahoma! (1955)\n",
"6 1041 5.320048 Secrets & Lies (1996)\n",
"7 3302 5.281882 Beautiful People (1999)\n",
"8 2959 5.239546 Fight Club (1999)\n",
"9 7983 5.219090 Broadway Danny Rose (1984)"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_recs = algo.recommend(-1, 10, ratings=new_rates)\n",
"new_recs.join(data.movies['title'], on='item')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And just to demonstrate this changes with the ratings, let's replace Frozen with Die Hard (1036):"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>item</th>\n",
" <th>score</th>\n",
" <th>title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>670</td>\n",
" <td>5.749457</td>\n",
" <td>World of Apu, The (Apur Sansar) (1959)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>3475</td>\n",
" <td>5.516608</td>\n",
" <td>Place in the Sun, A (1951)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1545</td>\n",
" <td>5.499457</td>\n",
" <td>Ponette (1996)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>48698</td>\n",
" <td>5.499457</td>\n",
" <td>Deliver Us from Evil (2006)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>3546</td>\n",
" <td>5.499457</td>\n",
" <td>What Ever Happened to Baby Jane? (1962)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>5</td>\n",
" <td>7614</td>\n",
" <td>5.499457</td>\n",
" <td>Oklahoma! (1955)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>6</td>\n",
" <td>26116</td>\n",
" <td>5.499457</td>\n",
" <td>Hush... Hush, Sweet Charlotte (1964)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>7</td>\n",
" <td>5404</td>\n",
" <td>5.499457</td>\n",
" <td>84 Charing Cross Road (1987)</td>\n",
" </tr>\n",
" <tr>\n",
" <td>8</td>\n",
" <td>4077</td>\n",
" <td>5.499457</td>\n",
" <td>With a Friend Like Harry... (Harry, un ami qui...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>9</td>\n",
" <td>1446</td>\n",
" <td>5.498118</td>\n",
" <td>Kolya (Kolja) (1996)</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" item score title\n",
"0 670 5.749457 World of Apu, The (Apur Sansar) (1959)\n",
"1 3475 5.516608 Place in the Sun, A (1951)\n",
"2 1545 5.499457 Ponette (1996)\n",
"3 48698 5.499457 Deliver Us from Evil (2006)\n",
"4 3546 5.499457 What Ever Happened to Baby Jane? (1962)\n",
"5 7614 5.499457 Oklahoma! (1955)\n",
"6 26116 5.499457 Hush... Hush, Sweet Charlotte (1964)\n",
"7 5404 5.499457 84 Charing Cross Road (1987)\n",
"8 4077 5.499457 With a Friend Like Harry... (Harry, un ami qui...\n",
"9 1446 5.498118 Kolya (Kolja) (1996)"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_recs = algo.recommend(-1, 10, ratings=pd.Series({1036: 5.0, 2761: 4.0}))\n",
"new_recs.join(data.movies['title'], on='item')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment