Skip to content

Instantly share code, notes, and snippets.

@SOVIETIC-BOSS88
Last active March 16, 2020 19:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save SOVIETIC-BOSS88/460f7b47468172d548bd0d37112bdbe5 to your computer and use it in GitHub Desktop.
Save SOVIETIC-BOSS88/460f7b47468172d548bd0d37112bdbe5 to your computer and use it in GitHub Desktop.
FAST AI JOURNEY: COURSE V3. PART 1. LESSON 4. Documenting my fast.ai journey: 20 YEARS OF GAMES PROJECT. COLLABORATIVE FILTERING AND TABULAR MODELS.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# FAST AI JOURNEY: COURSE V3. PART 1. LESSON 4. \n",
"## Documenting my fast.ai journey: 20 YEARS OF GAMES PROJECT. COLLABORATIVE FILTERING AND TABULAR MODELS."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this new project, we will analyze the '20 Years of Games' Dataset, available on Kaggle, using what we have learned on collaborative filtering and tabular data.\n",
"\n",
"Every notebook starts with the following three lines; they ensure that any edits to libraries you make are reloaded here automatically, and also that any charts or images displayed are shown in this notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tabular Models."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from fastai import *\n",
"from fastai.tabular import *"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting the Data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Steam Video Games Dataset isn't available on the [fastai dataset page](https://course.fast.ai/datasets) due to copyright restrictions. You can download it from Kaggle however. Let's see how to do this by using the [Kaggle API](https://github.com/Kaggle/kaggle-api) as it's going to be pretty useful to you if you want to join a competition or use other Kaggle datasets later on.\n",
"\n",
"First, install the Kaggle API by uncommenting the following line and executing it, or by executing it in your terminal (depending on your platform you may need to modify this slightly to either add `source activate fastai` or similar, or prefix `pip` with a path. Have a look at how `conda install` is called for your platform in the appropriate *Returning to work* section of https://course-v3.fast.ai/. (Depending on your environment, you may also need to append \"--user\" to the command.)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"#! pip install kaggle --upgrade"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then you need to upload your credentials from Kaggle on your instance. Login to kaggle and click on your profile picture on the top left corner, then 'My account'. Scroll down until you find a button named 'Create New API Token' and click on it. This will trigger the download of a file named 'kaggle.json'.\n",
"\n",
"Upload this file to the directory this notebook is running in, by clicking \"Upload\" on your main Jupyter page, then uncomment and execute the next two commands (or run them in a terminal)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"#! mkdir -p ~/.kaggle/\n",
"#! mv kaggle.json ~/.kaggle/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You're all set to download the data from [20 Years of Games](https://www.kaggle.com/egrinstein/20-years-of-games/version/2)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"#! chmod 600 /home/jupyter/.kaggle/kaggle.json"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PosixPath('data/ign')"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"path = Path('data/ign')\n",
"path.mkdir(parents=True, exist_ok=True)\n",
"path"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"#! kaggle datasets download -d egrinstein/20-years-of-games -f ign.csv -p {path}\n",
"#! unzip -q -n {path}/ign.csv.zip -d {path}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tabular data should be in a Pandas `DataFrame`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>score_phrase</th>\n",
" <th>title</th>\n",
" <th>url</th>\n",
" <th>platform</th>\n",
" <th>score</th>\n",
" <th>one_hot_score</th>\n",
" <th>genre</th>\n",
" <th>editors_choice</th>\n",
" <th>release_year</th>\n",
" <th>release_month</th>\n",
" <th>release_day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Amazing</td>\n",
" <td>LittleBigPlanet PS Vita</td>\n",
" <td>/games/littlebigplanet-vita/vita-98907</td>\n",
" <td>PlayStation Vita</td>\n",
" <td>9.0</td>\n",
" <td>1</td>\n",
" <td>Platformer</td>\n",
" <td>Y</td>\n",
" <td>2012</td>\n",
" <td>9</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Amazing</td>\n",
" <td>LittleBigPlanet PS Vita -- Marvel Super Hero E...</td>\n",
" <td>/games/littlebigplanet-ps-vita-marvel-super-he...</td>\n",
" <td>PlayStation Vita</td>\n",
" <td>9.0</td>\n",
" <td>1</td>\n",
" <td>Platformer</td>\n",
" <td>Y</td>\n",
" <td>2012</td>\n",
" <td>9</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Great</td>\n",
" <td>Splice: Tree of Life</td>\n",
" <td>/games/splice/ipad-141070</td>\n",
" <td>iPad</td>\n",
" <td>8.5</td>\n",
" <td>1</td>\n",
" <td>Puzzle</td>\n",
" <td>N</td>\n",
" <td>2012</td>\n",
" <td>9</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Great</td>\n",
" <td>NHL 13</td>\n",
" <td>/games/nhl-13/xbox-360-128182</td>\n",
" <td>Xbox 360</td>\n",
" <td>8.5</td>\n",
" <td>1</td>\n",
" <td>Sports</td>\n",
" <td>N</td>\n",
" <td>2012</td>\n",
" <td>9</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Great</td>\n",
" <td>NHL 13</td>\n",
" <td>/games/nhl-13/ps3-128181</td>\n",
" <td>PlayStation 3</td>\n",
" <td>8.5</td>\n",
" <td>1</td>\n",
" <td>Sports</td>\n",
" <td>N</td>\n",
" <td>2012</td>\n",
" <td>9</td>\n",
" <td>11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" score_phrase title \\\n",
"0 Amazing LittleBigPlanet PS Vita \n",
"1 Amazing LittleBigPlanet PS Vita -- Marvel Super Hero E... \n",
"2 Great Splice: Tree of Life \n",
"3 Great NHL 13 \n",
"4 Great NHL 13 \n",
"\n",
" url platform score \\\n",
"0 /games/littlebigplanet-vita/vita-98907 PlayStation Vita 9.0 \n",
"1 /games/littlebigplanet-ps-vita-marvel-super-he... PlayStation Vita 9.0 \n",
"2 /games/splice/ipad-141070 iPad 8.5 \n",
"3 /games/nhl-13/xbox-360-128182 Xbox 360 8.5 \n",
"4 /games/nhl-13/ps3-128181 PlayStation 3 8.5 \n",
"\n",
" one_hot_score genre editors_choice release_year release_month \\\n",
"0 1 Platformer Y 2012 9 \n",
"1 1 Platformer Y 2012 9 \n",
"2 1 Puzzle N 2012 9 \n",
"3 1 Sports N 2012 9 \n",
"4 1 Sports N 2012 9 \n",
"\n",
" release_day \n",
"0 12 \n",
"1 12 \n",
"2 12 \n",
"3 11 \n",
"4 11 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(path/'ign.csv')\n",
"df['one_hot_score'] = df['score'].map(lambda x: 0 if x <7 else 1)\n",
"\n",
"cols = list(df.columns.values)\n",
"cols = ['score_phrase', 'title', 'url','platform','score','one_hot_score','genre',\n",
" 'editors_choice','release_year','release_month', 'release_day',]\n",
"\n",
"df = df[cols]\n",
"\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"dep_var = 'one_hot_score'\n",
"\n",
"#Here we do not include 'score_phrase' or 'editors_choice' as factors\n",
"cat_names = ['title', 'platform', 'genre', \n",
" 'release_year', 'release_month', 'release_day']\n",
"\n",
"\n",
"procs = [FillMissing, Categorify, Normalize]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"data = (TabularList.from_df(df, path=path, cat_names=cat_names, procs=procs)\n",
" .split_by_idx(list(range(800,1000)))\n",
" .label_from_df(cols=dep_var)\n",
" .add_test(test, label=0)\n",
" .databunch())"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<table> <col width='10px'> <col width='10px'> <col width='10px'> <col width='10px'> <col width='10px'> <col width='10px'> <col width='10px'> <tr>\n",
" <th>title</th>\n",
" <th>platform</th>\n",
" <th>genre</th>\n",
" <th>release_year</th>\n",
" <th>release_month</th>\n",
" <th>release_day</th>\n",
" <th>target</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Battlefield: Bad Company 2</th>\n",
" <th>PC</th>\n",
" <th>Shooter</th>\n",
" <th>2010</th>\n",
" <th>3</th>\n",
" <th>2</th>\n",
" <th>1</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Tetris Worlds</th>\n",
" <th>PC</th>\n",
" <th>Puzzle</th>\n",
" <th>2002</th>\n",
" <th>1</th>\n",
" <th>9</th>\n",
" <th>0</th>\n",
" </tr>\n",
" <tr>\n",
" <th>WWE SmackDown vs. Raw 2008</th>\n",
" <th>PlayStation Portable</th>\n",
" <th>Wrestling</th>\n",
" <th>2007</th>\n",
" <th>11</th>\n",
" <th>1</th>\n",
" <th>0</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Mortal Kombat: Shaolin Monks</th>\n",
" <th>PlayStation 2</th>\n",
" <th>Fighting, Action</th>\n",
" <th>2005</th>\n",
" <th>9</th>\n",
" <th>16</th>\n",
" <th>1</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Moon Diver</th>\n",
" <th>PlayStation 3</th>\n",
" <th>Action</th>\n",
" <th>2011</th>\n",
" <th>4</th>\n",
" <th>4</th>\n",
" <th>1</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Magic: The Gathering -- Duels of the Planeswalkers 2013</th>\n",
" <th>iPad</th>\n",
" <th>Card, Battle</th>\n",
" <th>2012</th>\n",
" <th>6</th>\n",
" <th>25</th>\n",
" <th>1</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Chop Chop Runner</th>\n",
" <th>iPhone</th>\n",
" <th>Action</th>\n",
" <th>2010</th>\n",
" <th>4</th>\n",
" <th>7</th>\n",
" <th>0</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Watchmen: The End is Nigh -- Part 2</th>\n",
" <th>PC</th>\n",
" <th>Action</th>\n",
" <th>2009</th>\n",
" <th>8</th>\n",
" <th>26</th>\n",
" <th>0</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Mario Bros.-e</th>\n",
" <th>Game Boy Advance</th>\n",
" <th>Platformer</th>\n",
" <th>2002</th>\n",
" <th>11</th>\n",
" <th>15</th>\n",
" <th>0</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Serious Sam: The Second Encounter</th>\n",
" <th>PC</th>\n",
" <th>Shooter</th>\n",
" <th>2002</th>\n",
" <th>2</th>\n",
" <th>6</th>\n",
" <th>1</th>\n",
" </tr>\n",
"</table>\n"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data.show_batch(rows=10)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"learn = tabular_learner(data, layers=[200,100], metrics=accuracy)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"TabularModel(\n",
" (embeds): ModuleList(\n",
" (0): Embedding(12442, 50)\n",
" (1): Embedding(60, 31)\n",
" (2): Embedding(113, 50)\n",
" (3): Embedding(23, 12)\n",
" (4): Embedding(13, 7)\n",
" (5): Embedding(32, 17)\n",
" )\n",
" (emb_drop): Dropout(p=0.0)\n",
" (bn_cont): BatchNorm1d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (layers): Sequential(\n",
" (0): Linear(in_features=167, out_features=200, bias=True)\n",
" (1): ReLU(inplace)\n",
" (2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (3): Linear(in_features=200, out_features=100, bias=True)\n",
" (4): ReLU(inplace)\n",
" (5): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (6): Linear(in_features=100, out_features=2, bias=True)\n",
" )\n",
")"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.model"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total time: 00:02\n",
"epoch train_loss valid_loss accuracy\n",
"1 0.530561 0.692435 0.600000 (00:02)\n",
"\n"
]
}
],
"source": [
"learn.fit(1, 1e-2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Inference."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"score_phrase Amazing\n",
"title LittleBigPlanet PS Vita -- Marvel Super Hero E...\n",
"url /games/littlebigplanet-ps-vita-marvel-super-he...\n",
"platform PlayStation Vita\n",
"score 9\n",
"one_hot_score 1\n",
"genre Platformer\n",
"editors_choice Y\n",
"release_year 2012\n",
"release_month 9\n",
"release_day 12\n",
"Name: 1, dtype: object"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"row = df.iloc[1]\n",
"row"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1, tensor(0), tensor([0.9383, 0.0617]))"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.predict(row)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment