Skip to content

Instantly share code, notes, and snippets.

@ZhiyaoShu
Last active March 8, 2024 04:40
Show Gist options
  • Save ZhiyaoShu/df16c10dc7d347f9c18faed67640b740 to your computer and use it in GitHub Desktop.
Save ZhiyaoShu/df16c10dc7d347f9c18faed67640b740 to your computer and use it in GitHub Desktop.
Music Recommendation System.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/ZhiyaoShu/df16c10dc7d347f9c18faed67640b740/copy_of_music_recommendation_system_full_code.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DyAjEw4OHmDb"
},
"source": [
"# **Music Recommendation System**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FMCaC7Q_tq1m"
},
"source": [
"### **The objective:**\n",
"In an era where information overload is commonplace, recommendation systems serve as a crucial interface to curate personalized user experiences, thus helping platforms retain and grow their user base amidst stiff competition. The primary goal of this project is to explore and develop three distinct types of recommendation systems (User Similarity-Based, Model-Based Collaborative, and Cluster-Based Systems). By employing different methodologies, this project aims to practice and clarify each approach in providing accurate and meaningful recommendations.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BVUiyhYTHS1t"
},
"source": [
"## **Data Dictionary**\n",
"\n",
"The core data is the Taste Profile Subset released by the Echo Nest as part of the Million Song Dataset. There are two files in this dataset. The first file contains the details about the song id, titles, release, artist name, and the year of release. The second file contains the user id, song id, and the play count of users.\n",
"\n",
"song_data\n",
"\n",
"song_id - A unique id given to every song\n",
"\n",
"title - Title of the song\n",
"\n",
"Release - Name of the released album\n",
"\n",
"Artist_name - Name of the artist\n",
"\n",
"year - Year of release\n",
"\n",
"count_data\n",
"\n",
"user _id - A unique id given to the user\n",
"\n",
"\n",
"play_count - Number of times the song was played\n",
"\n",
"## **Data Source**\n",
"http://millionsongdataset.com/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NRJtXkTrHxMQ"
},
"source": [
"### **Importing Libraries and the Dataset**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0oHdIcZFQZRB",
"outputId": "30442014-df63-4522-f5e2-e6e18c7ef513"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
]
}
],
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "R4YvKrpzId3K",
"outputId": "1d251bc7-defd-4a0f-d988-997ed7679c61"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: surprise in /usr/local/lib/python3.10/dist-packages (0.1)\n",
"Requirement already satisfied: scikit-surprise in /usr/local/lib/python3.10/dist-packages (from surprise) (1.1.3)\n",
"Requirement already satisfied: joblib>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-surprise->surprise) (1.3.2)\n",
"Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.10/dist-packages (from scikit-surprise->surprise) (1.23.5)\n",
"Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from scikit-surprise->surprise) (1.11.3)\n"
]
}
],
"source": [
"import warnings\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sn\n",
"import pandas as pd\n",
"import io\n",
"\n",
"from sklearn.metrics.pairwise import cosine_similarity\n",
"from collections import defaultdict\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.preprocessing import LabelEncoder\n",
"\n",
"!pip install surprise\n",
"import surprise as se\n",
"\n",
"# Import necessary libraries\n",
"from sklearn.metrics.pairwise import cosine_similarity\n",
"from sklearn.model_selection import train_test_split, KFold\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.cluster import KMeans\n",
"from scipy.sparse import csr_matrix\n",
"from sklearn.neighbors import NearestNeighbors\n",
"from math import sqrt\n",
"from surprise.model_selection import GridSearchCV\n",
"from surprise import Dataset, Reader\n",
"from surprise.model_selection import train_test_split\n",
"from surprise import Dataset, Reader, KNNWithMeans, accuracy\n",
"from surprise import KNNBasic\n",
"from surprise import accuracy\n",
"\n",
"from surprise import accuracy\n",
"from surprise.model_selection import train_test_split\n",
"from surprise import Dataset, Reader, SVD\n",
"\n",
"from sklearn.decomposition import TruncatedSVD\n",
"from scipy.sparse import csr_matrix"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bUGKX140wf-S"
},
"source": [
"### **Load the dataset**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "si6ulhIYImck"
},
"outputs": [],
"source": [
"df_songs = pd.read_csv('/content/drive/MyDrive/000/song_data.csv')\n",
"df_counts = pd.read_csv('/content/drive/MyDrive/000/count_data.csv')\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "12TKB2M7XyC6"
},
"source": [
"### **Understanding the data by viewing a few observations**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "GCLzBuYiXlPM",
"outputId": "a82395fa-2220-479f-c08e-3453ca7c8611"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" song_id title \\\n",
"0 SOQMMHC12AB0180CB8 Silent Night \n",
"1 SOVFVAK12A8C1350D9 Tanssi vaan \n",
"2 SOGTUKN12AB017F4F1 No One Could Ever \n",
"3 SOBNYVR12A8C13558C Si Vos Querés \n",
"4 SOHSBXH12A8C13B0DF Tangle Of Aspens \n",
"5 SOZVAPQ12A8C13B63C Symphony No. 1 G minor \"Sinfonie Serieuse\"/All... \n",
"6 SOQVRHI12A6D4FB2D7 We Have Got Love \n",
"7 SOEYRFT12AB018936C 2 Da Beat Ch'yall \n",
"8 SOPMIYT12A6D4F851E Goodbye \n",
"9 SOJCFMH12A8C13B0C2 Mama_ mama can't you see ? \n",
"\n",
" release \\\n",
"0 Monster Ballads X-Mas \n",
"1 Karkuteillä \n",
"2 Butter \n",
"3 De Culo \n",
"4 Rene Ablaze Presents Winter Sessions \n",
"5 Berwald: Symphonies Nos. 1/2/3/4 \n",
"6 Strictly The Best Vol. 34 \n",
"7 Da Bomb \n",
"8 Danny Boy \n",
"9 March to cadence with the US marines \n",
"\n",
" artist_name year \n",
"0 Faster Pussy cat 2003 \n",
"1 Karkkiautomaatti 1995 \n",
"2 Hudson Mohawke 2006 \n",
"3 Yerba Brava 2003 \n",
"4 Der Mystic 0 \n",
"5 David Montgomery 0 \n",
"6 Sasha / Turbulence 0 \n",
"7 Kris Kross 1993 \n",
"8 Joseph Locke 0 \n",
"9 The Sun Harbor's Chorus-Documentary Recordings 0 \n"
]
}
],
"source": [
"# See top 10 records of songs_df data\n",
"print(df_songs.head(10))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "tV1ed0ApXpu3",
"outputId": "51757a17-1b60-41e0-9c73-c6ba4388f79f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Unnamed: 0 user_id song_id \\\n",
"0 0 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOAKIMP12A8C130995 \n",
"1 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBBMDR12A8C13253B \n",
"2 2 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXHDL12A81C204C0 \n",
"3 3 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBYHAJ12A6701BF1D \n",
"4 4 b80344d063b5ccb3212f76538f3d9e43d87dca9e SODACBL12A8C13C273 \n",
"5 5 b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E \n",
"6 6 b80344d063b5ccb3212f76538f3d9e43d87dca9e SODXRTY12AB0180F3B \n",
"7 7 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOFGUAY12AB017B0A8 \n",
"8 8 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOFRQTD12A81C233C0 \n",
"9 9 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOHQWYZ12A6D4FA701 \n",
"\n",
" play_count \n",
"0 1 \n",
"1 2 \n",
"2 1 \n",
"3 1 \n",
"4 1 \n",
"5 5 \n",
"6 1 \n",
"7 1 \n",
"8 1 \n",
"9 1 \n"
]
}
],
"source": [
"# See top 10 records of counts_df data\n",
"print(df_counts.head(10))\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bvKb5FHcXzcN"
},
"source": [
"### **Check the data types and and missing values of each column**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "yyoHc_cnX19J",
"outputId": "ad3b1d72-db47-46f8-e85b-accb7a163b05"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 2000000 entries, 0 to 1999999\n",
"Data columns (total 4 columns):\n",
" # Column Dtype \n",
"--- ------ ----- \n",
" 0 Unnamed: 0 int64 \n",
" 1 user_id object\n",
" 2 song_id object\n",
" 3 play_count int64 \n",
"dtypes: int64(2), object(2)\n",
"memory usage: 61.0+ MB\n",
"None\n"
]
}
],
"source": [
"# See the info of the counts_df data\n",
"print(df_counts.info())\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "rz3zDx_LX42y",
"outputId": "ce95326e-7351-49ae-ca32-8083563d0698"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 1000000 entries, 0 to 999999\n",
"Data columns (total 5 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 song_id 1000000 non-null object\n",
" 1 title 999985 non-null object\n",
" 2 release 999995 non-null object\n",
" 3 artist_name 1000000 non-null object\n",
" 4 year 1000000 non-null int64 \n",
"dtypes: int64(1), object(4)\n",
"memory usage: 38.1+ MB\n",
"None\n"
]
}
],
"source": [
"# See the info of the songs_df data\n",
"print(df_songs.info())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "oTeurvID2T9U",
"outputId": "86922d84-5076-44c9-d45a-64529bd338eb"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" user_id song_id play_count \\\n",
"0 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOAKIMP12A8C130995 1 \n",
"1 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBBMDR12A8C13253B 2 \n",
"2 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXHDL12A81C204C0 1 \n",
"3 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBYHAJ12A6701BF1D 1 \n",
"4 b80344d063b5ccb3212f76538f3d9e43d87dca9e SODACBL12A8C13C273 1 \n",
"5 b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 5 \n",
"6 b80344d063b5ccb3212f76538f3d9e43d87dca9e SODXRTY12AB0180F3B 1 \n",
"7 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOFGUAY12AB017B0A8 1 \n",
"8 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOFRQTD12A81C233C0 1 \n",
"9 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOHQWYZ12A6D4FA701 1 \n",
"\n",
" title release \\\n",
"0 The Cove Thicker Than Water \n",
"1 Entre Dos Aguas Flamenco Para Niños \n",
"2 Stronger Graduation \n",
"3 Constellations In Between Dreams \n",
"4 Learn To Fly There Is Nothing Left To Lose \n",
"5 Apuesta Por El Rock 'N' Roll Antología Audiovisual \n",
"6 Paper Gangsta The Fame Monster \n",
"7 Stacked Actors There Is Nothing Left To Lose \n",
"8 Sehr kosmisch Musik von Harmonia \n",
"9 Heaven's gonna burn your eyes Hôtel Costes 7 by Stéphane Pompougnac \n",
"\n",
" artist_name year \n",
"0 Jack Johnson 0 \n",
"1 Paco De Lucia 1976 \n",
"2 Kanye West 2007 \n",
"3 Jack Johnson 2005 \n",
"4 Foo Fighters 1999 \n",
"5 Héroes del Silencio 2007 \n",
"6 Lady GaGa 2008 \n",
"7 Foo Fighters 1999 \n",
"8 Harmonia 0 \n",
"9 Thievery Corporation feat. Emiliana Torrini 2002 \n"
]
}
],
"source": [
"# Left merge the counts and songs data on \"song_id\". Drop duplicates from song_df data simultaneously\n",
"df_songs = df_songs.drop_duplicates(subset='song_id')\n",
"df_merge = df_counts.merge(df_songs, on='song_id',how='left')\n",
"\n",
"# Drop the column 'Unnamed: 0'\n",
"merged_df = df_merge.drop(columns=['Unnamed: 0'])\n",
"## Name the obtained dataframe as \"df\"\n",
"df = merged_df\n",
"\n",
"print(df.head(10))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "oxeoOVxh2T9U",
"outputId": "22863df1-a280-46dd-cdf5-e428fc0997f1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" user_id song_id play_count title \\\n",
"0 54961 153 1 The Cove \n",
"1 54961 413 2 Entre Dos Aguas \n",
"2 54961 736 1 Stronger \n",
"3 54961 750 1 Constellations \n",
"4 54961 1188 1 Learn To Fly \n",
"5 54961 1239 5 Apuesta Por El Rock 'N' Roll \n",
"6 54961 1536 1 Paper Gangsta \n",
"7 54961 2056 1 Stacked Actors \n",
"8 54961 2220 1 Sehr kosmisch \n",
"9 54961 3046 1 Heaven's gonna burn your eyes \n",
"\n",
" release \\\n",
"0 Thicker Than Water \n",
"1 Flamenco Para Niños \n",
"2 Graduation \n",
"3 In Between Dreams \n",
"4 There Is Nothing Left To Lose \n",
"5 Antología Audiovisual \n",
"6 The Fame Monster \n",
"7 There Is Nothing Left To Lose \n",
"8 Musik von Harmonia \n",
"9 Hôtel Costes 7 by Stéphane Pompougnac \n",
"\n",
" artist_name year \n",
"0 Jack Johnson 0 \n",
"1 Paco De Lucia 1976 \n",
"2 Kanye West 2007 \n",
"3 Jack Johnson 2005 \n",
"4 Foo Fighters 1999 \n",
"5 Héroes del Silencio 2007 \n",
"6 Lady GaGa 2008 \n",
"7 Foo Fighters 1999 \n",
"8 Harmonia 0 \n",
"9 Thievery Corporation feat. Emiliana Torrini 2002 \n"
]
}
],
"source": [
"# Apply label encoding for \"user_id\" and \"song_id\"\n",
"# Create a label encoder object\n",
"le = LabelEncoder()\n",
"\n",
"# Label encoding for 'user_id'\n",
"df['user_id'] = le.fit_transform(df['user_id'])\n",
"\n",
"# Reset the label encoder and do label encoding for 'song_id'\n",
"df['song_id'] = le.fit_transform(df['song_id'])\n",
"\n",
"print (df.head(10))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gcY5LKAQvk9J"
},
"source": [
"A dataset of size 2000000 rows x 7 columns can be quite large and may require a lot of computing resources to process. This can lead to long processing times and can make it difficult to train and evaluate your model efficiently.\n",
"In order to address this issue, it may be necessary to trim down the dataset to a more manageable size."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7GGH9TW0_9uX"
},
"outputs": [],
"source": [
"# Get the column containing the users\n",
"users = df.user_id\n",
"\n",
"# Create a dictionary from users to their number of songs\n",
"ratings_count = dict()\n",
"for user in users:\n",
" # If we already have the user, just add 1 to their rating count\n",
" if user in ratings_count:\n",
" ratings_count[user] += 1\n",
" # Otherwise, set their rating count to 1\n",
" else:\n",
" ratings_count[user] = 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-cc6mOK7_9uX"
},
"outputs": [],
"source": [
"# We want our users to have listened at least 90 songs\n",
"RATINGS_CUTOFF = 90\n",
"remove_users = []\n",
"for user, num_ratings in ratings_count.items():\n",
" if num_ratings < RATINGS_CUTOFF:\n",
" remove_users.append(user)\n",
"df = df.loc[~df.user_id.isin(remove_users)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B5BS-Wk5_9uY"
},
"outputs": [],
"source": [
"# Get the column containing the songs\n",
"songs = df.song_id\n",
"# Create a dictionary from songs to their number of users\n",
"ratings_count = dict()\n",
"for song in songs:\n",
" # If we already have the song, just add 1 to their rating count\n",
" if song in ratings_count:\n",
" ratings_count[song] += 1\n",
" # Otherwise, set their rating count to 1\n",
" else:\n",
" ratings_count[song] = 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_nCtGwGO_9uY"
},
"outputs": [],
"source": [
"# We want our song to be listened by at least 120 users to be considred\n",
"RATINGS_CUTOFF = 120\n",
"remove_songs = []\n",
"for song, num_ratings in ratings_count.items():\n",
" if num_ratings < RATINGS_CUTOFF:\n",
" remove_songs.append(song)\n",
"df_final= df.loc[~df.song_id.isin(remove_songs)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8qaKeoMcGpad"
},
"outputs": [],
"source": [
"# Records with play_count more than(>) 5\n",
"df=df_final[df_final.play_count>5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "aL1JZ00o5JtQ",
"outputId": "ca68c503-08e6-45dd-a768-0782d0d00f44"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(12522, 7)\n",
" user_id song_id play_count title \\\n",
"356 27018 198 20 Rorol \n",
"359 27018 248 7 Auto-Dub \n",
"365 27018 318 11 Hilarious Movie Of The 90s \n",
"386 27018 926 11 One Minute To Midnight \n",
"402 27018 1262 6 Lights & Music \n",
"\n",
" release artist_name year \n",
"356 Identification Parade Octopus Project 2002 \n",
"359 Skream! Skream 2006 \n",
"365 Pause Four Tet 2001 \n",
"386 Justice Justice 0 \n",
"402 Lights & Music Cut Copy 2008 \n"
]
}
],
"source": [
"# Check the shape of the data\n",
"print(df.shape)\n",
"print(df.head())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uZcr1Eke2T9W"
},
"source": [
"## **Exploratory Data Analysis**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DE_gukSJ2T9W"
},
"source": [
"**Total number of unique user id**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "n5E24_Ec2T9W",
"outputId": "f9531b02-f66a-4ebd-a96c-edde824ac227"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of unique user_id: 3156\n"
]
}
],
"source": [
"# Display total number of unique user_id\n",
"unique_users = df_final['user_id'].nunique()\n",
"print(f\"Total number of unique user_id: {unique_users}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5SlpPkIE2T9W",
"outputId": "73e4a037-b123-4a78-f995-3fcb80711b6e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of unique song_id: 563\n"
]
}
],
"source": [
"# Total number of unique song_id\n",
"unique_songs = df_final['song_id'].nunique()\n",
"print(f\"Total number of unique song_id: {unique_songs}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eGXPsCjXVpUW"
},
"source": [
"**Total number of unique artists**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "qSVUwb8h2T9X",
"outputId": "b3e59359-5a04-49a1-e154-f5315111857b"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of unique artists: 232\n"
]
}
],
"source": [
"# Total number of unique artists\n",
"unique_artists = df_final['artist_name'].nunique()\n",
"print(f\"Total number of unique artists: {unique_artists}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "joFF5zndX1Dk"
},
"source": [
"**Songs played in a year**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "bQp2iVMC2T9Y"
},
"outputs": [],
"source": [
"df_final = df.dropna(subset=['year'])\n",
"df_final = df[df['year'] != 0]\n",
"\n",
"songs_in_a_year=df_final.groupby('year')['song_id'].nunique()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 578
},
"id": "bZCkOiAB2T9Y",
"outputId": "12f8a130-cb5b-430c-9804-7195936019af"
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA9wAAAJOCAYAAABFiQ/hAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB3rklEQVR4nO3dd3gU5fr/8c+GklBMIAgJkRZ6lypEQZEWUREEBBREiooeOgcRlC7SxC5NRUAFC9bjQUFFFEVAWgCRQxMFKaEnECCQ5P79wS/7ZaXtJjtpvl/XlevKzuw+93PPzuzOvTPzjMvMTAAAAAAAwK8CMrsDAAAAAADkRBTcAAAAAAA4gIIbAAAAAAAHUHADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcAAFNwAAAAAADqDgBgAAAADAARTcAIAs4fvvv5fL5dJHH32U2V3xSmxsrDp06KAiRYrI5XLppZdeSld7qfl///3313zuH3/8IZfLpblz56YrphOyct9SzZ07Vy6XS3/88UdmdwUAkMNRcAPAP0hqoREUFKR9+/ZdMr9JkyaqXr16JvQs+xk0aJCWLFmi4cOH65133tEdd9xxyXO6d+8ul8t1zb/u3btfNsaCBQvSXcj7S+oPAql/efLkUdmyZdWtWzf9/vvvmd09R4wZM8Yj5/z586tq1aoaMWKE4uPjHYt74sQJFS9eXLfccovM7JL5q1atUkBAgJ544gnH+gAA8I/cmd0BAEDGS0xM1KRJk/Tqq69mdleyre+++05t2rTRkCFDrvic3r17q3nz5u7Hu3fv1qhRo/Too4+qcePG7unlypVTgwYNdObMGeXNm9c9fcGCBfr11181cOBAR3JIi/79+6t+/fo6f/681q9fr9dff12LFi3S5s2bFRERkdndc8SMGTNUsGBBnTp1Sl9//bWeffZZfffdd1qxYoVcLpff4xUqVEgvvfSSOnfurDfeeEOPPvqoe15SUpIee+wxlS5dWmPHjvV7bACAf1FwA8A/UK1atfTGG29o+PDhObZIupKEhAQVKFAg3e0cOnRIhQoVuupzoqKiFBUV5X68du1ajRo1SlFRUerateslzw8KCkp3v5zWuHFjdejQQZLUo0cPVaxYUf3799e8efM0fPjwTO6dMzp06KDrr79ekvTYY4+pffv2+uSTT7Rq1SqP99dXZqazZ88qX758l8zr1KmT5s2bp2HDhqlNmzYKCwuTJL388svauHGjvvzyS+XPnz/Nsb3lr+0FAP6pOKUcAP6BnnrqKSUnJ2vSpElXfd7Vrsd1uVwaM2aM+3Hq6bfbt29X165dFRISoqJFi2rkyJEyM+3du1dt2rRRcHCwwsPD9fzzz182ZnJysp566imFh4erQIECuueee7R3795Lnrd69WrdcccdCgkJUf78+XXbbbdpxYoVHs9J7dNvv/2mBx54QIULF1ajRo2umvPvv/+u++67T6GhocqfP78aNmyoRYsWueennpZvZpo2bZr7dOP0+vs13E2aNNGiRYv0559/umOUKVPmqm3873//U4cOHRQaGqqgoCDVq1dP//nPfzyec/78eY0dO1YVKlRQUFCQihQpokaNGumbb75JU7+bNm0q6cLR+yvZtGmTunfvrrJlyyooKEjh4eHq2bOnjh496n7OsmXL5HK59Omnn17y+gULFsjlcmnlypU+5SpJW7ZsUdOmTZUvXz6VKFFC48ePV0pKSppyTfX3nFNSUvTSSy+pWrVqCgoKUlhYmHr37q3jx497vK5MmTK6++67tWTJEtWrV0/58uXTrFmzrhhn+vTpSkxM1ODBgyVJe/fu1ZgxY9SpUye1atVKkvTVV1+pcePGKlCggK677jrddddd2rJli0c73ix/KW3bCwDg6jjCDQD/QJGRkerWrZveeOMNDRs2zK9HuTt16qQqVapo0qRJWrRokcaPH6/Q0FDNmjVLTZs21eTJkzV//nwNGTJE9evX16233urx+meffVYul0tPPvmkDh06pJdeeknNmzdXTEyM+0jgd999p1atWqlu3boaPXq0AgICNGfOHDVt2lQ//vijbrrpJo8277vvPlWoUEETJky47DWxqWJjY3XzzTfr9OnT6t+/v4oUKaJ58+bpnnvu0UcffaR7771Xt956q9555x09+OCDatGihbp16+a3ZXexp59+WnFxcfrrr7/04osvSpIKFix4xedv2bJFt9xyi2644QYNGzZMBQoU0Icffqi2bdvq448/1r333ivpQlE1ceJEPfzww7rpppsUHx+vtWvXav369WrRooXP/dy1a5ckqUiRIld8zjfffKPff/9dPXr0UHh4uLZs2aLXX39dW7Zs0apVq+RyudSkSROVLFlS8+fPd/c11fz581WuXDn30WRvcz148KBuv/12JSUluZ/3+uuvX/aIcnpy7t27t+bOnasePXqof//+2r17t1577TVt2LBBK1asUJ48edyv3bZtm+6//3717t1bjzzyiCpVqnTFOGXKlNHYsWP1xBNPqHv37po+fbpy587tvq7/nXfe0UMPPaTo6GhNnjxZp0+f1owZM9SoUSNt2LDB/QONN8v/Yt5uLwAALxgA4B9jzpw5JsnWrFlju3btsty5c1v//v3d82+77TarVq2a+/Hu3btNks2ZM+eStiTZ6NGj3Y9Hjx5tkuzRRx91T0tKSrISJUqYy+WySZMmuacfP37c8uXLZw899JB72rJly0yS3XDDDRYfH++e/uGHH5oke/nll83MLCUlxSpUqGDR0dGWkpLift7p06ctMjLSWrRocUmf7r//fq+Wz8CBA02S/fjjj+5pJ0+etMjISCtTpowlJyd75N+nTx+v2k21Zs2aKy7P1PyXLVvmnnbXXXdZ6dKlL3nu5d6XZs2aWY0aNezs2bPuaSkpKXbzzTdbhQoV3NNuvPFGu+uuu3zq98X9e+utt+zw4cO2f/9+W7RokZUpU8ZcLpetWbPmin07ffr0Je299957JsmWL1/unjZ8+HALDAy0EydOuKcdOnTIcufO7bGueZtr6vu5evVqj/ZCQkJMku3evfuqOaeuP9u2bbPDhw/b7t27bdasWRYYGGhhYWGWkJBgP/74o0my+fPne7x28eLFl0wvXbq0SbLFixdfNe7Fzp8/b7Vq1bLQ0FCTZLNmzTKzC+tloUKF7JFHHvF4/sGDBy0kJMRjurfL39ftBQBwbZxSDgD/UGXLltWDDz6o119/XQcOHPBbuw8//LD7/1y5cqlevXoyM/Xq1cs9vVChQqpUqdJlR7fu1q2brrvuOvfjDh06qHjx4vryyy8lSTExMdqxY4ceeOABHT16VEeOHNGRI0eUkJCgZs2aafny5ZecMvzYY4951fcvv/xSN910k8dptAULFtSjjz6qP/74Q7/99pt3CyGDHTt2TN999506duyokydPupfJ0aNHFR0drR07drhHpS9UqJC2bNmiHTt2pClWz549VbRoUUVEROiuu+5SQkKC5s2bp3r16l3xNRcfUT579qyOHDmihg0bSpLWr1/vntetWzclJiZ63Brugw8+UFJSkvuad19y/fLLL9WwYUOPMx6KFi2qLl26+JRzpUqVVLRoUUVGRqp3794qX768Fi1apPz582vhwoUKCQlRixYt3H05cuSI6tatq4IFC2rZsmUebUVGRio6Otrr2Llz59brr7+uY8eOqWHDhnrkkUckXThqfeLECd1///0ecXPlyqUGDRp4xPV2+afydnsBAFwbp5QDwD/YiBEj9M4772jSpEl6+eWX/dJmqVKlPB6HhIQoKCjIPejUxdP/fg2pJFWoUMHjscvlUvny5d33TE4tFB966KEr9iEuLk6FCxd2P46MjPSq73/++acaNGhwyfQqVaq452fF26bt3LlTZqaRI0dq5MiRl33OoUOHdMMNN2jcuHFq06aNKlasqOrVq+uOO+7Qgw8+qJo1a3oVa9SoUWrcuLFy5cql66+/XlWqVFHu3FffnTh27JjGjh2r999/X4cOHfKYFxcX5/6/cuXKql+/vubPn+/+gWb+/Plq2LChypcv73OuV3o/r3Ya9+V8/PHHCg4OVp48eVSiRAmVK1fOPW/Hjh2Ki4tTsWLFrtiXi3m7Ll6sfv36kqS6deu6T/9O3Q5Sryf/u+DgYPf/3i7/9PQRAHB5FNwA8A9WtmxZde3aVa+//rqGDRt2yfwrDQaWnJx8xTZz5crl1TRJabo+NPXo9XPPPadatWpd9jl/v9Y5vdfsZnWpy2TIkCFXPHqaWrDeeuut2rVrlz7//HN9/fXXevPNN/Xiiy9q5syZHmcnXEmNGjU8bnXmjY4dO+rnn3/WE088oVq1aqlgwYJKSUnRHXfcccnZCN26ddOAAQP0119/KTExUatWrdJrr72Wplz95dZbb73kB6OL+1OsWDHNnz//svOLFi3q8dhf62LqcnjnnXcUHh5+yfyLfwTxZfn7s48AAApuAPjHGzFihN59911Nnjz5knmpR4lPnDjhMf3PP/90rD9/P9XZzLRz5073EdjUo4vBwcE+F37XUrp0aW3btu2S6f/73//c8zOSt6Ofly1bVpKUJ08er5ZJaGioevTooR49eujUqVO69dZbNWbMGK8Kbl8dP35cS5cu1dixYzVq1Cj39Cud0t65c2cNHjxY7733ns6cOaM8efKoU6dO7vm+5Fq6dOnLxrnce5xW5cqV07fffqtbbrklQwvV1O2gWLFiV10Ovi5/AIB/cQ03APzDlStXTl27dtWsWbN08OBBj3nBwcG6/vrrtXz5co/p06dPd6w/b7/9tk6ePOl+/NFHH+nAgQPu2yDVrVtX5cqV09SpU3Xq1KlLXn/48OE0x77zzjv1yy+/eNx+KiEhQa+//rrKlCmjqlWrprnttChQoMBlT/n9u2LFiqlJkyaaNWvWZa/Hv3iZ/P00/oIFC6p8+fJKTExMf4cvI/Xshr+fzZA60vbfXX/99WrVqpXeffddzZ8/X3fccYfH0WVfcr3zzju1atUq/fLLLx7zr3Q0Oi06duyo5ORkPfPMM5fMS0pKuuTHKn+Jjo5WcHCwJkyYoPPnz18yP3U5+Lr8AQD+xRFuAICefvppvfPOO9q2bZuqVavmMe/hhx/WpEmT9PDDD6tevXpavny5tm/f7lhfQkND1ahRI/Xo0UOxsbF66aWXVL58efdgUQEBAXrzzTfVqlUrVatWTT169NANN9ygffv2admyZQoODtYXX3yRptjDhg3Te++9p1atWql///4KDQ3VvHnztHv3bn388ccKCMjY36nr1q2rDz74QIMHD1b9+vVVsGBBtW7d+rLPnTZtmho1aqQaNWrokUceUdmyZRUbG6uVK1fqr7/+0saNGyVJVatWVZMmTVS3bl2FhoZq7dq1+uijj9S3b19HcggODtatt96qKVOm6Pz587rhhhv09ddfX/W+3d26dVOHDh0k6bKFrLe5Dh06VO+8847uuOMODRgwwH1bsNKlS2vTpk1+ye+2225T7969NXHiRMXExKhly5bKkyePduzYoYULF+rll1925+JPwcHBmjFjhh588EHVqVNHnTt3VtGiRbVnzx4tWrRIt9xyi1577bU0LX8AgP9QcAMAVL58eXXt2lXz5s27ZN6oUaN0+PBhffTRR/rwww/VqlUrffXVV1ccJCq9nnrqKW3atEkTJ07UyZMn1axZM02fPl358+d3P6dJkyZauXKlnnnmGb322ms6deqUwsPD1aBBA/Xu3TvNscPCwvTzzz/rySef1KuvvqqzZ8+qZs2a+uKLL3TXXXf5Iz2f/Otf/1JMTIzmzJmjF198UaVLl75iwV21alWtXbtWY8eO1dy5c3X06FEVK1ZMtWvX9jiVuH///vrPf/6jr7/+WomJiSpdurTGjx+vJ554wrE8FixYoH79+mnatGkyM7Vs2VJfffXVFe//3rp1axUuXFgpKSm655570pxr8eLFtWzZMvXr10+TJk1SkSJF9NhjjykiIsJj1Pz0mjlzpurWratZs2bpqaeeUu7cuVWmTBl17dpVt9xyi9/i/N0DDzygiIgITZo0Sc8995wSExN1ww03qHHjxurRo4f7eb4ufwCA/7gsLSPWAAAAOCQpKUkRERFq3bq1Zs+endndAQAgzbiGGwAAZCmfffaZDh8+rG7dumV2VwAASBeOcAMAgCxh9erV2rRpk5555hldf/31Wr9+fWZ3CQCAdOEINwAAyBJmzJihxx9/XMWKFdPbb7+d2d0BACDdOMINAAAAAIADOMINAAAAAIADKLgBAAAAAHBAjr8Pd0pKivbv36/rrrtOLpcrs7sDAAAAAMgmzEwnT55URESEAgJ8P16d4wvu/fv3q2TJkpndDQAAAABANrV3716VKFHC59fl+IL7uuuuk3RhAQUHB2dybwAAAAAA2UV8fLxKlizprit9leML7tTTyIODgym4AQAAAAA+S+vlyQyaBgAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABmVpwJycna+TIkYqMjFS+fPlUrlw5PfPMMzIz93PMTKNGjVLx4sWVL18+NW/eXDt27MjEXgMAAAAAcG2ZWnBPnjxZM2bM0GuvvaatW7dq8uTJmjJlil599VX3c6ZMmaJXXnlFM2fO1OrVq1WgQAFFR0fr7NmzmdhzAAAAAACuzmUXH07OYHfffbfCwsI0e/Zs97T27dsrX758evfdd2VmioiI0L///W8NGTJEkhQXF6ewsDDNnTtXnTt3vmaM+Ph4hYSEKC4ujtuCAQAAAAC8lt56MlOPcN98881aunSptm/fLknauHGjfvrpJ7Vq1UqStHv3bh08eFDNmzd3vyYkJEQNGjTQypUrM6XPAAAAAAB4I3dmBh82bJji4+NVuXJl5cqVS8nJyXr22WfVpUsXSdLBgwclSWFhYR6vCwsLc8/7u8TERCUmJrofx8fHO9R7AAAAAACuLFOPcH/44YeaP3++FixYoPXr12vevHmaOnWq5s2bl+Y2J06cqJCQEPdfyZIl/dhjAAAAAAC8k6kF9xNPPKFhw4apc+fOqlGjhh588EENGjRIEydOlCSFh4dLkmJjYz1eFxsb6573d8OHD1dcXJz7b+/evc4mAQAAAADAZWRqwX369GkFBHh2IVeuXEpJSZEkRUZGKjw8XEuXLnXPj4+P1+rVqxUVFXXZNgMDAxUcHOzxBwAAAABARsvUa7hbt26tZ599VqVKlVK1atW0YcMGvfDCC+rZs6ckyeVyaeDAgRo/frwqVKigyMhIjRw5UhEREWrbtm1mdh0AAAAAgKvK1IL71Vdf1ciRI/Wvf/1Lhw4dUkREhHr37q1Ro0a5nzN06FAlJCTo0Ucf1YkTJ9SoUSMtXrxYQUFBmdhzAAAAAACuLlPvw50RuA83AAAAACAtsvV9uAEAAAAAyKkouAEAAAAAcAAFNwAAAAAADqDgBgAAAADAARTcAAAAAAA4IFNvCwYAAAAAuNRHq273a3sdGi7za3vwDke4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOyNSCu0yZMnK5XJf89enTR5J09uxZ9enTR0WKFFHBggXVvn17xcbGZmaXAQAAAADwSqYW3GvWrNGBAwfcf998840k6b777pMkDRo0SF988YUWLlyoH374Qfv371e7du0ys8sAAAAAAHgld2YGL1q0qMfjSZMmqVy5crrtttsUFxen2bNna8GCBWratKkkac6cOapSpYpWrVqlhg0bZkaXAQAAAADwSpa5hvvcuXN699131bNnT7lcLq1bt07nz59X8+bN3c+pXLmySpUqpZUrV2ZiTwEAAAAAuLZMPcJ9sc8++0wnTpxQ9+7dJUkHDx5U3rx5VahQIY/nhYWF6eDBg1dsJzExUYmJie7H8fHxTnQXAAAAAICryjJHuGfPnq1WrVopIiIiXe1MnDhRISEh7r+SJUv6qYcAAAAAAHgvSxTcf/75p7799ls9/PDD7mnh4eE6d+6cTpw44fHc2NhYhYeHX7Gt4cOHKy4uzv23d+9ep7oNAAAAAMAVZYmCe86cOSpWrJjuuusu97S6desqT548Wrp0qXvatm3btGfPHkVFRV2xrcDAQAUHB3v8AQAAAACQ0TL9Gu6UlBTNmTNHDz30kHLn/r/uhISEqFevXho8eLBCQ0MVHBysfv36KSoqihHKAQAAAABZXqYX3N9++6327Nmjnj17XjLvxRdfVEBAgNq3b6/ExERFR0dr+vTpmdBLAAAAAAB84zIzy+xOOCk+Pl4hISGKi4vj9HIAAAAA2cJHq273a3sdGi7za3v/FOmtJ7PENdwAAAAAAOQ0FNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwQO7M7gAAAAAAIHMsWd3Qr+1FN1jl1/ayO45wAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA5glHIAAAAAgGN+XlPXr+3dXH+dX9tzEke4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOyPSCe9++feratauKFCmifPnyqUaNGlq7dq17vplp1KhRKl68uPLly6fmzZtrx44dmdhjAAAAAACuLVML7uPHj+uWW25Rnjx59NVXX+m3337T888/r8KFC7ufM2XKFL3yyiuaOXOmVq9erQIFCig6Olpnz57NxJ4DAAAAAHB1uTMz+OTJk1WyZEnNmTPHPS0yMtL9v5nppZde0ogRI9SmTRtJ0ttvv62wsDB99tln6ty5c4b3GQAAAAAAb2TqEe7//Oc/qlevnu677z4VK1ZMtWvX1htvvOGev3v3bh08eFDNmzd3TwsJCVGDBg20cuXKzOgyAAAAAABeydSC+/fff9eMGTNUoUIFLVmyRI8//rj69++vefPmSZIOHjwoSQoLC/N4XVhYmHve3yUmJio+Pt7jDwAAAACAjJapp5SnpKSoXr16mjBhgiSpdu3a+vXXXzVz5kw99NBDaWpz4sSJGjt2rD+7CQAAAACAzzL1CHfx4sVVtWpVj2lVqlTRnj17JEnh4eGSpNjYWI/nxMbGuuf93fDhwxUXF+f+27t3rwM9BwAAAADg6jK14L7lllu0bds2j2nbt29X6dKlJV0YQC08PFxLly51z4+Pj9fq1asVFRV12TYDAwMVHBzs8QcAAAAAQEbL1FPKBw0apJtvvlkTJkxQx44d9csvv+j111/X66+/LklyuVwaOHCgxo8frwoVKigyMlIjR45URESE2rZtm5ldBwAAAADgqjK14K5fv74+/fRTDR8+XOPGjVNkZKReeukldenSxf2coUOHKiEhQY8++qhOnDihRo0aafHixQoKCsrEngMAAAAAcHWZWnBL0t1336277777ivNdLpfGjRuncePGZWCvAAAAAABIn0y9hhsAAAAAgJyKghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADggtzdPGjx4sNcNvvDCC2nuDAAAAAAAOYVXBfeGDRs8Hq9fv15JSUmqVKmSJGn79u3KlSuX6tat6/8eAgAAAACQDXlVcC9btsz9/wsvvKDrrrtO8+bNU+HChSVJx48fV48ePdS4cWNnegkAAAAAQDbj8zXczz//vCZOnOgutiWpcOHCGj9+vJ5//nm/dg4AAAAAgOzK54I7Pj5ehw8fvmT64cOHdfLkSb90CgAAAACA7M7ngvvee+9Vjx499Mknn+ivv/7SX3/9pY8//li9evVSu3btnOgjAAAAAADZjlfXcF9s5syZGjJkiB544AGdP3/+QiO5c6tXr1567rnn/N5BAAAAAACyI58L7vz582v69Ol67rnntGvXLklSuXLlVKBAAb93DgAAAACA7MrnU8pTHThwQAcOHFCFChVUoEABmZk/+wUAAAAAQLbmc8F99OhRNWvWTBUrVtSdd96pAwcOSJJ69eqlf//7337vIAAAAAAA2ZHPBfegQYOUJ08e7dmzR/nz53dP79SpkxYvXuzXzgEAAAAAkF35fA33119/rSVLlqhEiRIe0ytUqKA///zTbx0DAAAAACA78/kId0JCgseR7VTHjh1TYGCgXzoFAAAAAEB253PB3bhxY7399tvuxy6XSykpKZoyZYpuv/12v3YOAAAAAIDsyudTyqdMmaJmzZpp7dq1OnfunIYOHaotW7bo2LFjWrFihRN9BAAAAAAg2/H5CHf16tW1fft2NWrUSG3atFFCQoLatWunDRs2qFy5ck70EQAAAACAbMfnI9ySFBISoqefftrffQEAAAAAIMfw+Qh3mTJlNG7cOO3du9eJ/gAAAAAAkCP4XHAPHDhQn3zyiSIjI9WiRQu9//77SkxMdKJvAAAAAABkW2kquGNiYvTLL7+oSpUq6tevn4oXL66+fftq/fr1TvQRAAAAAIBsx+eCO1WdOnX0yiuvaP/+/Ro9erTefPNN1a9fX7Vq1dJbb70lM/NnPwEAAAAAyFbSNGiaJJ0/f16ffvqp5syZo2+++UYNGzZUr1699Ndff+mpp57St99+qwULFvizrwAAAAAAZBs+F9zr16/XnDlz9N577ykgIEDdunXTiy++qMqVK7ufc++996p+/fp+7SgAAAAAANmJz6eU169fXzt27NCMGTO0b98+TZ061aPYlqTIyEh17tz5mm2NGTNGLpfL4+/its6ePas+ffqoSJEiKliwoNq3b6/Y2FhfuwwAAAAAQIbz+Qj377//rtKlS1/1OQUKFNCcOXO8aq9atWr69ttv/69Duf+vS4MGDdKiRYu0cOFChYSEqG/fvmrXrp1WrFjha7cBAAAAAMhQPhfc1yq2fe5A7twKDw+/ZHpcXJxmz56tBQsWqGnTppKkOXPmqEqVKlq1apUaNmzo134AAAAAAOBPPp9SnpycrKlTp+qmm25SeHi4QkNDPf58tWPHDkVERKhs2bLq0qWL9uzZI0lat26dzp8/r+bNm7ufW7lyZZUqVUorV670OQ4AAAAAABnJ54J77NixeuGFF9SpUyfFxcVp8ODBateunQICAjRmzBif2mrQoIHmzp2rxYsXa8aMGdq9e7caN26skydP6uDBg8qbN68KFSrk8ZqwsDAdPHjwim0mJiYqPj7e4w8AAAAAgIzm8ynl8+fP1xtvvKG77rpLY8aM0f33369y5cqpZs2aWrVqlfr37+91W61atXL/X7NmTTVo0EClS5fWhx9+qHz58vnaNUnSxIkTNXbs2DS9FgAAAAAAf/H5CPfBgwdVo0YNSVLBggUVFxcnSbr77ru1aNGidHWmUKFCqlixonbu3Knw8HCdO3dOJ06c8HhObGzsZa/5TjV8+HDFxcW5//bu3ZuuPgEAAAAAkBY+F9wlSpTQgQMHJEnlypXT119/LUlas2aNAgMD09WZU6dOadeuXSpevLjq1q2rPHnyaOnSpe7527Zt0549exQVFXXFNgIDAxUcHOzxBwAAAABARvP5lPJ7771XS5cuVYMGDdSvXz917dpVs2fP1p49ezRo0CCf2hoyZIhat26t0qVLa//+/Ro9erRy5cql+++/XyEhIerVq5cGDx6s0NBQBQcHq1+/foqKimKEcgAAAABAludzwT1p0iT3/506dXKPGl6hQgW1bt3ap7b++usv3X///Tp69KiKFi2qRo0aadWqVSpatKgk6cUXX1RAQIDat2+vxMRERUdHa/r06b52GQAAAACADOdzwf13UVFRVz3F+2ref//9q84PCgrStGnTNG3atDS1DwAAAABAZvGq4P7Pf/7jdYP33HNPmjsDAAAAAEBO4VXB3bZtW68ac7lcSk5OTk9/AAAAAADIEbwquFNSUpzuBwAAAAAAOYpPtwUzM+3YsUNbtmxRUlKSU30CAAAAACDb87rg3r17t2rWrKnKlSurZs2aKlu2rNasWeNk3wAAAAAAyLa8LrifeOIJJSUl6d1339VHH32kkiVL6rHHHnOybwAAAAAAZFte3xbsp59+0kcffaRGjRpJkho2bKgSJUooISFBBQoUcKyDAAAAAABkR14f4T506JAqVKjgfly8eHHly5dPhw4dcqRjAAAAAABkZ14f4Xa5XDp16pTy5cvnnhYQEKCTJ08qPj7ePS04ONi/PQQAAAAAIBvyuuA2M1WsWPGSabVr13b/z324AQAAAAC4wOuCe9myZU72AwAAAACAHMXrgvu2225zsh8AAAAAAOQoXg+aBgAAAAAAvEfBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOSHfBHR8fr88++0xbt271R38AAAAAAMgRfC64O3bsqNdee02SdObMGdWrV08dO3ZUzZo19fHHH/u9gwAAAAAAZEc+F9zLly9X48aNJUmffvqpzEwnTpzQK6+8ovHjx/u9gwAAAAAAZEc+F9xxcXEKDQ2VJC1evFjt27dX/vz5ddddd2nHjh1+7yAAAAAAANmRzwV3yZIltXLlSiUkJGjx4sVq2bKlJOn48eMKCgryewcBAAAAAMiOcvv6goEDB6pLly4qWLCgSpUqpSZNmki6cKp5jRo1/N0/AAAAAACyJZ8L7n/961+66aabtHfvXrVo0UIBARcOkpctW5ZruAEAAAAA+P98LrglqV69eqpZs6Z2796tcuXKKXfu3Lrrrrv83TcAAAAAALItn6/hPn36tHr16qX8+fOrWrVq2rNnjySpX79+mjRpkt87CAAAAABAduRzwT18+HBt3LhR33//vccgac2bN9cHH3zg184BAAAAAJBd+XxK+WeffaYPPvhADRs2lMvlck+vVq2adu3a5dfOAQAAAACQXfl8hPvw4cMqVqzYJdMTEhI8CnAAAAAAAP7JfC6469Wrp0WLFrkfpxbZb775pqKiovzXMwAAAAAAsjGfTymfMGGCWrVqpd9++01JSUl6+eWX9dtvv+nnn3/WDz/84EQfAQAAAADIdnw+wt2oUSPFxMQoKSlJNWrU0Ndff61ixYpp5cqVqlu3rhN9BAAAAAAg20nTfbjLlSunN954w999AQAAAAAgx/Cq4I6Pj/e6weDg4DR3BgAAAACAnMKrgrtQoULXHIHczORyuZScnOyXjgEAAAAAkJ15VXAvW7bM6X4AAAAAAJCjeFVw33bbbe7/9+zZo5IlS15yxNvMtHfvXv/2DgAAAACymDkr7/Jrez2iFl37SciWfB6lPDIyUocPH75k+rFjxxQZGemXTgEAAAAAkN35XHCnXqv9d6dOnVJQUJBfOgUAAAAAQHbn9W3BBg8eLElyuVwaOXKk8ufP756XnJys1atXq1atWn7vIAAAAAAA2ZHXBfeGDRskXTjCvXnzZuXNm9c9L2/evLrxxhs1ZMgQ//cQAAAAAIBsyOuCO3Wk8h49eujll1/mftsAAAAAAFyF1wV3qjlz5jjRDwAAAACZ4Kkfevu9zQm3zfJ7m0B25FXB3a5dO82dO1fBwcFq167dVZ/7ySef+KVjAAAAAABkZ14V3CEhIe6RyUNCQhztEAAAAAAAOYFXBfecOXM0btw4DRkyhFPKAQAAAADwgtf34R47dqxOnTrlZF8AAAAAAMgxvC64zczJfgAAAAAAkKN4XXBLcl/HDQAAAAAArs6ngrtixYoKDQ296l9aTZo0SS6XSwMHDnRPO3v2rPr06aMiRYqoYMGCat++vWJjY9McAwAAAACAjOLTfbjHjh3ryCjla9as0axZs1SzZk2P6YMGDdKiRYu0cOFChYSEqG/fvmrXrp1WrFjh9z4AAAAAAOBPPhXcnTt3VrFixfzagVOnTqlLly564403NH78ePf0uLg4zZ49WwsWLFDTpk0lXRgtvUqVKlq1apUaNmzo134AAAAAAOBPXp9S7tT123369NFdd92l5s2be0xft26dzp8/7zG9cuXKKlWqlFauXOlIXwAAAAAA8Bevj3A7MUr5+++/r/Xr12vNmjWXzDt48KDy5s2rQoUKeUwPCwvTwYMHr9hmYmKiEhMT3Y/j4+P91l8AAAAAALzl9RHulJQUv55OvnfvXg0YMEDz589XUFCQ39qdOHGiQkJC3H8lS5b0W9sAAAAAAHjLp1HK/WndunU6dOiQ6tSpo9y5cyt37tz64Ycf9Morryh37twKCwvTuXPndOLECY/XxcbGKjw8/IrtDh8+XHFxce6/vXv3OpwJAAAAAACX8mnQNH9q1qyZNm/e7DGtR48eqly5sp588kmVLFlSefLk0dKlS9W+fXtJ0rZt27Rnzx5FRUVdsd3AwEAFBgY62ncAAAAAAK4l0wru6667TtWrV/eYVqBAARUpUsQ9vVevXho8eLBCQ0MVHBysfv36KSoqihHKAQAAAABZnlenlNepU0fHjx+XJI0bN06nT592tFOpXnzxRd19991q3769br31VoWHh+uTTz7JkNgAAAAAAKSHV0e4t27dqoSEBBUuXFhjx47VY489pvz58/u9M99//73H46CgIE2bNk3Tpk3zeywAAAAAAJzkVcFdq1Yt9ejRQ40aNZKZaerUqSpYsOBlnztq1Ci/dhAAAAAAgOzIq4J77ty5Gj16tP773//K5XLpq6++Uu7cl77U5XJRcAMAAAAAIC8L7kqVKun999+XJAUEBGjp0qV+vSc3AAAAAAA5jc+jlKekpDjRDwAAAAAAcpQ03RZs165deumll7R161ZJUtWqVTVgwACVK1fOr50DAAAAACC78uq2YBdbsmSJqlatql9++UU1a9ZUzZo1tXr1alWrVk3ffPONE30EAAAAACDb8fkI97BhwzRo0CBNmjTpkulPPvmkWrRo4bfOAQAAAACQXfl8hHvr1q3q1avXJdN79uyp3377zS+dAgAAAAAgu/O54C5atKhiYmIumR4TE8PI5QAAAAAA/H8+n1L+yCOP6NFHH9Xvv/+um2++WZK0YsUKTZ48WYMHD/Z7BwEAAAAAyI58LrhHjhyp6667Ts8//7yGDx8uSYqIiNCYMWPUv39/v3cQAAAAAIDsyOeC2+VyadCgQRo0aJBOnjwpSbruuuv83jEAAAAAALKzNN2HOxWFNgAAAAAAl+fzoGkAAAAAAODaKLgBAAAAAHAABTcAAAAAAA7wqeA+f/68mjVrph07djjVHwAAAAAAcgSfCu48efJo06ZNTvUFAAAAAIAcw+dTyrt27arZs2c70RcAAAAAAHIMn28LlpSUpLfeekvffvut6tatqwIFCnjMf+GFF/zWOQAAAAAAsiufC+5ff/1VderUkSRt377dY57L5fJPrwAAAAAAyOZ8LriXLVvmRD8AAAAAAMhR0nxbsJ07d2rJkiU6c+aMJMnM/NYpAAAAAACyO58L7qNHj6pZs2aqWLGi7rzzTh04cECS1KtXL/373//2ewcBAAAAAMiOfC64Bw0apDx58mjPnj3Knz+/e3qnTp20ePFiv3YOAAAAAIDsyudruL/++mstWbJEJUqU8JheoUIF/fnnn37rGAAAAAAA2ZnPR7gTEhI8jmynOnbsmAIDA/3SKQAAAAAAsjufC+7GjRvr7bffdj92uVxKSUnRlClTdPvtt/u1cwAAAAAAZFc+n1I+ZcoUNWvWTGvXrtW5c+c0dOhQbdmyRceOHdOKFSuc6CMAAAAAANmOz0e4q1evru3bt6tRo0Zq06aNEhIS1K5dO23YsEHlypVzoo8AAAAAAGQ7Ph/hlqSQkBA9/fTT/u4LAAAAAAA5RpoK7uPHj2v27NnaunWrJKlq1arq0aOHQkND/do5AAAAAACyK59PKV++fLnKlCmjV155RcePH9fx48f1yiuvKDIyUsuXL3eijwAAAAAAZDs+H+Hu06ePOnXqpBkzZihXrlySpOTkZP3rX/9Snz59tHnzZr93EgAAAACA7MbnI9w7d+7Uv//9b3exLUm5cuXS4MGDtXPnTr92DgAAAACA7MrngrtOnTrua7cvtnXrVt14441+6RQAAAAAANmdV6eUb9q0yf1///79NWDAAO3cuVMNGzaUJK1atUrTpk3TpEmTnOklAAAAAADZjFcFd61ateRyuWRm7mlDhw695HkPPPCAOnXq5L/eAQAAAACQTXlVcO/evdvpfgAAAAAAkKN4VXCXLl3a6X4AAAAAAJCj+HxbMEnav3+/fvrpJx06dEgpKSke8/r37++XjgEAAAAAkJ35XHDPnTtXvXv3Vt68eVWkSBG5XC73PJfLRcENAAAAAIDSUHCPHDlSo0aN0vDhwxUQ4PNdxQAAAAAA+EfwuWI+ffq0OnfuTLENAAAAAMBV+Fw19+rVSwsXLnSiLwAAAAAA5Bg+n1I+ceJE3X333Vq8eLFq1KihPHnyeMx/4YUX/NY5AAAAAACyqzQV3EuWLFGlSpUk6ZJB0wAAAAAAQBoK7ueff15vvfWWunfv7kB3AAAAAADIGXy+hjswMFC33HKLX4LPmDFDNWvWVHBwsIKDgxUVFaWvvvrKPf/s2bPq06ePihQpooIFC6p9+/aKjY31S2wAAAAAAJzkc8E9YMAAvfrqq34JXqJECU2aNEnr1q3T2rVr1bRpU7Vp00ZbtmyRJA0aNEhffPGFFi5cqB9++EH79+9Xu3bt/BIbAAAAAAAn+XxK+S+//KLvvvtO//3vf1WtWrVLBk375JNPvG6rdevWHo+fffZZzZgxQ6tWrVKJEiU0e/ZsLViwQE2bNpUkzZkzR1WqVNGqVavUsGFDX7sOAAAAAECG8bngLlSokCNHmZOTk7Vw4UIlJCQoKipK69at0/nz59W8eXP3cypXrqxSpUpp5cqVFNwAAAAAgCzN54J7zpw5fu3A5s2bFRUVpbNnz6pgwYL69NNPVbVqVcXExChv3rwqVKiQx/PDwsJ08ODBK7aXmJioxMRE9+P4+Hi/9hcAAAAAAG/4fA23v1WqVEkxMTFavXq1Hn/8cT300EP67bff0tzexIkTFRIS4v4rWbKkH3sLAAAAAIB3fD7CHRkZedX7bf/+++8+tZc3b16VL19eklS3bl2tWbNGL7/8sjp16qRz587pxIkTHke5Y2NjFR4efsX2hg8frsGDB7sfx8fHU3QDAAAAADKczwX3wIEDPR6fP39eGzZs0OLFi/XEE0+ku0MpKSlKTExU3bp1lSdPHi1dulTt27eXJG3btk179uxRVFTUFV8fGBiowMDAdPcDAAAAAID08LngHjBgwGWnT5s2TWvXrvWpreHDh6tVq1YqVaqUTp48qQULFuj777/XkiVLFBISol69emnw4MEKDQ1VcHCw+vXrp6ioKAZMAwAAAABkeX67hrtVq1b6+OOPfXrNoUOH1K1bN1WqVEnNmjXTmjVrtGTJErVo0UKS9OKLL+ruu+9W+/btdeuttyo8PNyn244BAAAAAJBZfD7CfSUfffSRQkNDfXrN7Nmzrzo/KChI06ZN07Rp09LTNQAAAAAAMpzPBXft2rU9Bk0zMx08eFCHDx/W9OnT/do5AAAAAPDWKz/f5/c2+9+80O9t4p/D54K7bdu2Ho8DAgJUtGhRNWnSRJUrV/ZXvwAAAAAAyNZ8LrhHjx7tRD8AAAAAAMhR/DZoGgAAAAAA+D9eH+EOCAjwuHb7clwul5KSktLdKQAAAAAAsjuvC+5PP/30ivNWrlypV155RSkpKX7pFAAAAAAA2Z3XBXebNm0umbZt2zYNGzZMX3zxhbp06aJx48b5tXMAAAAAAGRXabqGe//+/XrkkUdUo0YNJSUlKSYmRvPmzVPp0qX93T8AAAAAALIlnwruuLg4Pfnkkypfvry2bNmipUuX6osvvlD16tWd6h8AAAAAANmS16eUT5kyRZMnT1Z4eLjee++9y55iDgAAAAAALvC64B42bJjy5cun8uXLa968eZo3b95ln/fJJ5/4rXMAAAAAAGRXXhfc3bp1u+ZtwQAAAAAAwAVeF9xz5851sBsAAAAAAOQsaRqlHAAAAAAAXB0FNwAAAAAADqDgBgAAAADAARTcAAAAAAA4gIIbAAAAAAAHUHADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcAAFNwAAAAAADqDgBgAAAADAARTcAAAAAAA4gIIbAAAAAAAHUHADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcAAFNwAAAAAADqDgBgAAAADAARTcAAAAAAA4gIIbAAAAAAAHUHADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcAAFNwAAAAAADqDgBgAAAADAARTcAAAAAAA4IHdmdwAAAADA5Q34boBf23u56ct+bQ/A1XGEGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAAo5QDAAAgR+nw+VC/tvdRmyl+bQ/AP0emHuGeOHGi6tevr+uuu07FihVT27ZttW3bNo/nnD17Vn369FGRIkVUsGBBtW/fXrGxsZnUYwAAAAAAvJOpBfcPP/ygPn36aNWqVfrmm290/vx5tWzZUgkJCe7nDBo0SF988YUWLlyoH374Qfv371e7du0ysdcAAAAAAFxbpp5SvnjxYo/Hc+fOVbFixbRu3TrdeuutiouL0+zZs7VgwQI1bdpUkjRnzhxVqVJFq1atUsOGDTOj2wAAAAAAXFOWGjQtLi5OkhQaGipJWrdunc6fP6/mzZu7n1O5cmWVKlVKK1euzJQ+AgAAAADgjSwzaFpKSooGDhyoW265RdWrV5ckHTx4UHnz5lWhQoU8nhsWFqaDBw9etp3ExEQlJia6H8fHxzvWZwAAAAAAriTLHOHu06ePfv31V73//vvpamfixIkKCQlx/5UsWdJPPQQAAAAAwHtZouDu27ev/vvf/2rZsmUqUaKEe3p4eLjOnTunEydOeDw/NjZW4eHhl21r+PDhiouLc//t3bvXya4DAAAAAHBZmVpwm5n69u2rTz/9VN99950iIyM95tetW1d58uTR0qVL3dO2bdumPXv2KCoq6rJtBgYGKjg42OMPAAAAAICMlqnXcPfp00cLFizQ559/ruuuu859XXZISIjy5cunkJAQ9erVS4MHD1ZoaKiCg4PVr18/RUVFMUI5AAAAACBLy9SCe8aMGZKkJk2aeEyfM2eOunfvLkl68cUXFRAQoPbt2ysxMVHR0dGaPn16BvcUAAAAAADfZGrBbWbXfE5QUJCmTZumadOmZUCPAAAAAADwjywxaBoAAAAAADkNBTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHBA7szuAAAAAAAA6bFlfQO/t1mtzup0t8ERbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABjFIOAAAA+KjHV4P93uacVi/4vU0AmYsj3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADggNyZ3QEAAAAAOd+EH7v5tb2nGr/t1/YAJ3CEGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADMrXgXr58uVq3bq2IiAi5XC599tlnHvPNTKNGjVLx4sWVL18+NW/eXDt27MiczgIAAAAA4INMLbgTEhJ04403atq0aZedP2XKFL3yyiuaOXOmVq9erQIFCig6Olpnz57N4J4CAAAAAOCb3JkZvFWrVmrVqtVl55mZXnrpJY0YMUJt2rSRJL399tsKCwvTZ599ps6dO2dkVwEAAAAA8EmWvYZ79+7dOnjwoJo3b+6eFhISogYNGmjlypWZ2DMAAAAAAK4tU49wX83BgwclSWFhYR7Tw8LC3PMuJzExUYmJie7H8fHxznQQAAAAAICryLJHuNNq4sSJCgkJcf+VLFkys7sEAAAAAPgHyrIFd3h4uCQpNjbWY3psbKx73uUMHz5ccXFx7r+9e/c62k8AAAAAAC4nyxbckZGRCg8P19KlS93T4uPjtXr1akVFRV3xdYGBgQoODvb4AwAAAAAgo2XqNdynTp3Szp073Y93796tmJgYhYaGqlSpUho4cKDGjx+vChUqKDIyUiNHjlRERITatm2beZ0GAAAAAMALmVpwr127Vrfffrv78eDBgyVJDz30kObOnauhQ4cqISFBjz76qE6cOKFGjRpp8eLFCgoKyqwuAwAAAADglUwtuJs0aSIzu+J8l8ulcePGady4cRnYKwAAAAAA0i/LXsMNAAAAAEB2RsENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOCBTbwsGAACAf45WHw7ze5tfdZzk9zYBwF84wg0AAAAAgAMouAEAAAAAcAAFNwAAAAAADqDgBgAAAADAAQyaBgBANlBrzFi/txkzZrTf2wQAAP+HI9wAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4IDcmd0BAEDWEtX/Gb+3ufKVkX5vE864ccIYv7a38alL27txqn9jSNLGIf5v85/m9nef8mt7y7pO8Gt7AJAdcYQbAAAAAAAHUHADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcACjlAMAgByp1suj/dpezICxfm0PAJDzcYQbAAAAAAAHUHADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcACjlAMAcqz6T47ze5trJo/ye5vAtUS9NcKv7a3sOd6v7QEALo8j3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIAD/lGjlDct2sHvbX53+CO/t/lPc/fNQ/ze5n9/nur3Nr1x153+H7140Zf+H2XZGy07+jfu1x8ysjMAAAD+WTjCDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADjgHzVKeUZpWfpBv7b39Z/vXDKtVZVH/RpDkr7a+vol0+6q09+vMRatf8Wv7fni7qZP+bW9/343wa/t+aJV2zF+be+rz/zbni+ad3nGr+19O3/kJdNu7+7fGMvmXhrj1kf8G0OSlr9xaZxGj/s3zk8zLo2RURr827+5rH4+83KpM8K/o/qvH8+o/tlFXT9vQ+v8vI0DADJXtjjCPW3aNJUpU0ZBQUFq0KCBfvnll8zuEgAAAAAAV5XlC+4PPvhAgwcP1ujRo7V+/XrdeOONio6O1qFDhzK7awAAAAAAXFGWL7hfeOEFPfLII+rRo4eqVq2qmTNnKn/+/Hrrrbcyu2sAAAAAAFxRli64z507p3Xr1ql58+buaQEBAWrevLlWrlyZiT0DAAAAAODqsvSgaUeOHFFycrLCwsI8poeFhel///vfZV+TmJioxMRE9+O4uDhJUnx8vJJSzvu9j/Hx8ZdMS0o553yMZP/GuFKc836Oc9kYSYmXeWbWj3PZGOczJpckP8e5fIyzjsfIqDhJ57JfjIyKk5m5JCc6n4u/Y2RUnEzN5azzny/+jpFRcS4b44zzMSQpKQPiZEiM0xn0ne/nOJeLcS6DcklMcD6XxISM2a886+c4l4+RMfv7Z/wc53IxTickOR5DkhIyIE7CqWTHY5zyc4zUOKmxzCxNbbgsra/MAPv379cNN9ygn3/+WVFRUe7pQ4cO1Q8//KDVq1df8poxY8Zo7NixGdlNAAAAAEAOtnfvXpUoUcLn12XpI9zXX3+9cuXKpdjYWI/psbGxCg8Pv+xrhg8frsGDB7sfp6Sk6NixYypSpIhcLpdXcePj41WyZEnt3btXwcHBaU8gk2NkVJycEiOj4pBL1oxDLlkzDrlkvRgZFYdcsmYccsmacXJKjIyKQy5ZM05WzcXMdPLkSUVERKQpXpYuuPPmzau6detq6dKlatu2raQLBfTSpUvVt2/fy74mMDBQgYGBHtMKFSqUpvjBwcGOvtkZFSOj4uSUGBkVh1yyZhxyyZpxyCXrxcioOOSSNeOQS9aMk1NiZFQccsmacbJiLiEhIWmOk6ULbkkaPHiwHnroIdWrV0833XSTXnrpJSUkJKhHjx6Z3TUAAAAAAK4oyxfcnTp10uHDhzVq1CgdPHhQtWrV0uLFiy8ZSA0AAAAAgKwkyxfcktS3b98rnkLuhMDAQI0ePfqSU9OzW4yMipNTYmRUHHLJmnHIJWvGIZesFyOj4pBL1oxDLlkzTk6JkVFxyCVrxslJuVwsS49SDgAAAABAdhWQ2R0AAAAAACAnouAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcANZEGMZIidISUnJ7C4A2UZGfe6zXcJJGbEeZ9Q67HQuGbXNJyYmZkicnMTf7w0Fdxpl1EaSkwqv7P4lnxEfWHFxcZIkl8vleKxUGbGOsb34LjvvtBw7dkySFBDg7FfMnj17tGnTJknZ//PlYqzHvsnu7/2pU6ckOf+5nxHbZUZvk06uX39vOydtl05ITk6W5OxyOnLkiKQL63BqPCfs2rVLx48fd3Sb3L17txYuXOje73PKtm3b1KJFC+3cudOxGAkJCTp37pyOHz8uKXv/IPL39cpfuVBwe+nQoUP69ddftWLFCpmZYxvhsWPHtHv3bv3++++SnP0CdvLDSpIOHz6sX3/9VStXrpR04QPS3xvh1q1b9fbbb/u1zcvZsGGDBgwYoEOHDjkWY8uWLercubMWLlzoWAxJ2rt3rz799FNNmzZNp0+flsvl8vuHVup6vH37dknOrceHDh3S5s2b9csvvzgWJ3XZJCUl+b3ti504cUJ//vmn/ve//0m6kIu/t5eDBw/q559/1n/+8x9JzmyTv/76q1q0aKE333zTr+3+3ZYtW1SmTBk99thjkpwpIv744w+99dZbGjdunHbt2uXYjiTrsW8yYj2WpP/973967rnnlJCQ4Pe2U8XExOjBBx/Url27HIshZcx2mRHbZEbti23btk2jR49W9+7d9eabb+p///uf39fl2NhY93ekU3bv3q2ZM2dq8ODB+uabb9wFq79t375dQ4YMUfv27TV+/Hjt3r3bkRhly5bVo48+KknKlSuXI/uxGzduVIUKFfTpp5/6ve1UmzZt0k033aQNGzbo8OHDkpwpUmNiYtSwYUP99NNPjv0Q9ttvv6ljx45q0qSJoqOjtWrVKke2/Yu3yddee02bN2/2+/7r1q1b1a9fP7Vt21ZPPfWU1q1b579cDNe0ceNGq1Spkt14441WunRpq1q1qi1atMji4uL8Hic1Rrly5Sw6Otr+/PNPv8bYunWrPfLIIxYfH29mZklJSX5tP9WmTZusTp06VrlyZStRooR16tTJr+2npKRYXFycBQcHm8vlspdeesljnj/FxMRYrly5bMiQIZfthz/8+uuvFhISYoMGDbJdu3Y5EsPswjpWunRpq1evnl133XVWrVo1O3v2rN/aT41Rq1Ytq1atmlWpUsUaNWpkv/76qyUmJvo1TkxMjFWoUMEiIyMtLCzM6tSpYz/++KMlJCT4Lcavv/5qd955px0/ftzMzM6fP++3ti+2efNma9SokVWoUMHKly9vXbp08XuMTZs2WbVq1axGjRpWqFAhu+WWW/weY8uWLVaoUCEbPHiw/f77735vP9WGDRusQIEC1qhRI6tSpYp98803ZubfbWXTpk12ww032K233mphYWF2ww032F9//eW39lOxHvsmI9bjlJQUO3XqlEVGRprL5bLhw4f7/fPL7MJ7nzt3bke/W8wyZrvMiG0yo/bFtmzZYiEhIda+fXu7+eabrUGDBlaiRAn79ttvzcw/Of32229WqlQp69ixo/3666/pbu9yNm3aZBEREdaqVSurUKGCVapUySZPnmzJycl+/6wsUqSIPfTQQ9a2bVtr2LChPfvss5aSkuLXOJ9++qkVK1bMGjZsaI8++qh7enJyst9ixMTEWIECBezJJ5/0W5t/t2fPHitVqpT9+9//9pie+hnjr3xiYmIsX758Nn78eOvYsaPVrVvXL+1ebMuWLVa4cGEbOHCgPffcc3bfffdZy5Yt7cyZM3597zdv3myFCxe2nj17Wps2beyOO+6wwoUL2+LFi/0WY+vWrRYcHGwPPfSQtW/f3lq0aGGBgYH29ttv+6V9Cu5r+PPPP61UqVI2ZswY27Fjh+3bt89atGhhxYoVs6lTp9qRI0f8Emfv3r0WERFhw4YNs++//94WLlxodevWtVKlStm3337rl8J4586ddsMNN1hQUJC1b9/esaI7dQMcNmyY/fLLL/b2229b2bJlbcuWLe7n+GtDbNeunfXo0cPy5MljkyZN8kubF9u0aZMVKFDAhg8f7p529uxZj53h9H44nj592u655x7r27evmV1YNtu2bbPvv//erzt5qevyM888Y4cOHbK9e/daiRIl/PqB9fvvv1vx4sVtxIgRtnbtWluxYoXVqVPHKlWqZAsXLrQzZ874Jc6BAwesbNmy9tRTT9nGjRttzZo11rx5cytevLi9+eab7nU7PX7//Xf3DnfdunXdxYq/t5etW7dakSJFbOjQofbNN9/Ym2++aTVq1LBXXnnFbzF+++03K1KkiD311FO2detW+/HHHy0sLMx++uknv8U4d+6cdenSxXr37m1mF9bjtWvX2scff2yHDh3y23sfExNj+fPnt9GjR1tCQoKVKVPGBgwY4Je2U/31119Wvnx5e+aZZ9zberly5Wz+/Pl+jcN67JuMWI8v9vjjj9sjjzxi+fPnt379+l3yI0h6vsc2b95s+fPntxEjRrinxcfH26FDh9Lc5uVkxHaZEdtkRu2LJSUlWdeuXT1+LNqwYYP16tXLcuXKZf/973/NLH3f+/v27bObb77ZbrzxRrvpppusV69etnnz5nT3/WJ//PGHVahQwZ566ik7d+6cmZkNGzbMypcv77fPYjOzXbt2WenSpe3pp592T+vVq5f179/fzPz7496XX35pFStWtEmTJlmNGjXc67SZ2cmTJ9Pd/tatWy137tw2btw4M7vwHi9dutRmzZplK1as8NsPru+//741adLEHePpp5+2zp07W7t27Wzp0qV+ibFhwwbLmzevDRs2zMzMvvvuOytdurS9//77fmnfzOzMmTN277332uOPP+6eNnv2bOvSpYudO3fODh8+7Jc4p06dsujoaI8fJtetW2eFCxe2wMBA+/DDD80s/fvi//rXv6xt27bux7GxsTZixAjLlSuXTZ8+3czS95lPwX0NH3/8sTVp0sROnjzp3kH57LPPLCgoyCpVqmRvvvmmmaW/gPzuu++satWqtn//fve0pKQka9WqlRUvXtxWrlxpZmlfoeLj461Lly7WoUMHe+mll6xhw4bWpk0bvxfdhw8ftnr16nn8cnf06FFr0qSJffPNN7Z48WK/HFFNXd533nmnvfLKK/b666+by+WyF154wcwuvEfp/QLet2+fuVwu69Chg3vaoEGDrFmzZta4cWOPD/v0bOgnTpywWrVq2bJly8zMrFWrVla9enUrWLCgRUZG2oIFC+zUqVNpbj/VBx98YDfffLOdOHHCPa1FixY2Y8YMGzdunG3cuDHdR9XefPNNu+eeezy+ZF9++WVzuVxWunRp++6778ws/R+Ma9eutfLly9v//vc/j+k9evSwUqVK2YIFC9K1TSYkJFj//v2tffv29sEHH1jDhg2tZs2afi9W4uLirE2bNtanTx/3tLNnz1r79u3twQcf9EuMo0ePWsOGDT22yfPnz1vTpk3tgw8+sDlz5tiBAwfSHefMmTNWv359+/jjj83MrFmzZlazZk0rWLCglSpVyp599lmLjY1NV4zt27eby+Xy2LmbOXOmXX/99bZ69ep0tX2xJUuWWJ06dTx2sFq3bm3jx4+3vn372pdffpnuXMxYj32RUeux2f99PnXt2tVeeOEF+/bbby1Pnjzu2G+++abt3bs3ze3HxsZaSEiI3X777e5pjz32mEVFRVnlypXtrrvuchfe6d23cHq7zKhtMqP2xc6dO2e33Xabu1BJdejQIXv88cctKCjIvT+WVkuXLrXo6GiLiYmxuXPnWp06dfxadCclJdnLL79sHTt2tAMHDriX18GDB61UqVK2adMmv8WZOXOm9ezZ044dO+Ze9n379rWmTZvabbfdZl27drUVK1b4Jd7evXvt/vvvtyNHjtgLL7xgNWvWtMGDB1uPHj1s5syZ7h8W0iI5OdnGjh1rLpfLfvvtNzMza9q0qd14440WEhJi5cqVs2bNmtnGjRvTncdzzz1nbdq0MTOzqKgoi46Oth49elj79u3N5XLZ7NmzzSzt6/LRo0etXr16Huvw4cOHrXbt2n77PDa78Llfs2ZNe+2119zTnnrqKStVqpTdeOONVqZMGZszZ46ZpW+7PHLkiFWtWtU++ugjj7bat29vTZo0sbx589qqVavSnsj/165dO+vVq9cl0ydMmGAul8sWLVrkEd9XFNzXMGXKFIuIiPCY9vXXX1vPnj3tnnvusfDwcL8UQx9++KEVKlTIXYxefGSzWbNmVqVKlXR/kUyYMMHeeecdS0pKsnfeeceRovvkyZM2YcIEW7NmjXvauHHjLCgoyKpUqWJly5a1ChUq2L59+8ws7Stual8nTZrkPoLy2muvWUBAgFWvXt3q1atnBw8eTGc2Zg0aNLBq1arZt99+a40aNbLbbrvNRowYYUOHDrXIyEiLiopKd4yDBw9a/fr1bf369TZkyBC744477JdffrF9+/bZgw8+aKVKlXIXqulZB1544QULDQ11n343depUy5Mnj7Vq1cqqVatmxYoVs4ULF6YrzpAhQ6xSpUoe0z799FMbOnSoNWrUyKpWrZrm/l/su+++s+uvv959+v3FPxTcf//9Vrx48XTvsM6aNcsWLFhgZmY//fSTI8VKbGys9ejRwx0ndUf/jTfesNtuu81SUlI8diLSmsvUqVNt+fLl7sfPPPOM5c2b1+rXr28VKlSwsLAw905RWmOcOXPGWrRoYZ988ok9/fTTFh0dbVu2bLGEhAQbPny4Va9e3d566y2PPH21atUq9y/NqTZu3GhVq1a1qVOnmpl/3pd58+bZdddd594xTd1WunbtarfccouVL1/epkyZku5Yy5Ytc3w9fv3113PMevz88887vh5f/NoFCxa4d1r/+9//Wt68ed2n5qb3cq/77rvP6tSpY2+++aY1aNDAmjdvbi+88IJNmzbNatSoYVWqVHHvW6Qnl7Nnzzq6XWbUNplR+2JmZn369LGoqCg7duyYx/Q9e/ZY+/bt7c4770zXaexnzpyxn3/+2f34rbfechfdFxfD6Xnf586day+//LLHtNjYWCtUqJD7x31/2LVrl8cp8WPHjrWgoCCbMGGCjRo1yjp16mRly5b1y6UMCQkJVrNmTduwYYMlJCTY66+/bkWKFDGXy+VebulZ1w4ePGiPPvqoBQYGWvXq1a1du3YWExNj586ds08++cRatmxp9913X7qPps+fP9/CwsLszTfftDvvvNOOHj3qnvfss89a7ty5032ZwS+//OL+P3WZfPLJJxYUFGTff/99utpOlZKSYvfff7/VqFHDPvroIxsyZIjlz5/f5s6da4sWLbIJEyZYQECAx2d2Whw6dMiioqJs/Pjx7rMzfv/9d4uIiLCPP/7Y7rjjDuvSpYslJSWla5sZM2aMlSxZ8pL65Ny5c/bYY49ZlSpV0vWjLgX3ZVx8dO63336zMmXK2KBBgyw2NtbWrFljBQoUsOeff97MzMqWLWuzZs1KU5wjR464d6ZOnjxpJUuW9DhCkFp079u3z8qWLWtTpkxJU4zLrSCJiYn29ttvX1J0nzlzJk1fJEeOHHH/Sn769Gn39Pfee8/Cw8Ptk08+sT///NOOHj1qNWrUsM6dO6cpxt9Pt5s9e7a1bNnS/fimm26yXLlyWb9+/Xxu/+I4qRucmdltt91mLpfL7r33Xo/4P//8sxUvXtzj+nFfYlzcVqNGjaxx48bWvXt3+/zzzz2eGx0dbc2aNUtDJp7vy5EjR6x8+fIWHh5urVu3tjx58tjXX3/tXs86depkNWrU8PkUsItjfPvtt1a9enV78cUX7cyZM/brr79a/vz57ZVXXrG9e/da6dKl3b8SpkdKSopVqVLF4/Sfi8+cqFKlSrrWgYvjmF34wlq+fPklxcrp06ft999/T3MBefbsWVu3bt0l8WbNmmUNGzb0mOYvixYtstKlS9vnn3/u/qJv0qSJx9G2tGrXrp3VqVPHevToYe+++67HvB49eljt2rXT1f7Fy/ni5dK/f3+/7nCbmdWtW9dCQ0MtOjra8ubNa19//bV73qBBgywyMvKSHXJfJScnW9WqVR1Zjy+34+nUenz69Glbu3at+7G/1+PLfSY5tR5f3M/PP//cateu7V4ut99+u+XKlcs6d+6c5nwu/uHhgQcesFy5clmbNm08vg/27dtnpUuXvuQaT1+l9tHJ7fLi98aJbTK1zd9++81Kly7tyL7Y333wwQdWq1Yte/755y+5rGPu3LkWERFhe/bsSVeMv68/lzvSPXbsWL8cUU2NdebMGatcubLHmQeff/6533I5e/as3Xnnne7T7s3MfvzxRytWrJjH52danDt3zs6fP28tW7a0H3/80cwu7LMEBwdbhQoV3Kewp1fqmQz16tVzH+lO9eKLL1p4eHi6Ty3/448/rHXr1la3bl2PU8vNLvwoUqFCBfvggw/SFcPs0nVs9+7dVrduXRs5cqRHzPRYunSpdezY0dq2bWvly5f32AYTExOtWrVqNnr06HTHGThwoNWsWdMeeOABmzJlihUsWNBdLz333HNWrVq1NP3YcvEyWL16td1yyy3Wt29f9z5t6vxvv/3WIiIibMOGDWnOgVHK/+a3335Tz549tW/fPklSyZIlNXjwYH3yySeqWbOmmjVrpkceeUSDBw9WcnKyAgMD3bfa8MWWLVvUoEED/fTTT5KkvHnzasiQIVqxYoWee+4597SUlBQVKVJEJUqU0MGDB9MUI3WU8NSRCZOTk5U3b17df//9evzxxxUbG6sHH3xQR48e1cCBA9WxY0efRrL9ey5BQUHueRUrVtSiRYt07733qlSpUgoNDVWtWrV8vsXW32MkJyfLzFSyZEn3KKW9evXS3r17NXjwYL3++usaO3asTzEujrN69Wr3tO+//16PP/642rdvr6JFi7pHRLzxxhsVEhLiHmEyrblI0qRJk3Tw4EHNmzfP/T6dO3dOkhQdHZ2mkVhT46xYsUKSVKRIEa1evVpTp05Vs2bNdPfdd6tp06bu9/rOO++Uy+Vy36ImLblUrVpVLVq00Msvv6zKlSurQYMG6tmzp/r166fQ0FCdPXtW+/fv9zmXw4cPa926ddq0aZNOnjwpl8ulKVOmKCYmRgMGDJAkBQYGupfZjTfe6POtNi6OcfFtepKTk5UrVy41atRIkydPVv78+XXbbbfp0KFDGjp0qLp166azZ8/6FGft2rXauHGjAgICVKdOHUkXts+L3+fU0VddLpcGDRqkdu3apSmXv4+wXL58eX355Ze65557VLhwYUlSgwYNvG77cjFOnjwpSXrxxRd19uxZzZ071/25mLq9REdHKzAw0Odt/+LldfFyvnjE4N69e6tgwYKaM2eOR8y05BIfHy9JWrt2rd555x117dpVN998s26++WadOXNGktSiRQsFBgb6PHr16dOnlZKS4s4jICBAU6ZM0fr16/22HqfGOH/+vMf0pKQkv67HqXHOnDmjfPnyqW7duu44/lqPU2Nc7vvIX+vxxXHOnj3r7ruZKTIyUmFhYQoICFCvXr20fft2Pffcc/riiy/0yCOPuN8nX2JcPKry/PnzNWzYMN1///0qWrSoe3pYWJjKlCnj3q7Sm8uLL76o06dP+227vPi9z507t8c8f22TqTFS+1WmTBkNGTJEH330kV/3xf744w+98cYbmj17tpYsWSJJ6tixoxo1aqRZs2bp3Xff9Wi3fv36yp8/v0/vzeVipL43qevDQw89pP79+2vDhg16+eWX1alTJ40dO/aS5etNjMWLF1/2OQEBAQoICHDHfuqpp9S7d2+f3pur5RIYGKgvvvhCd911l3s9CA0NVVhYmEJDQ9MU4+uvv5Yk5cmTR7lz51bt2rW1c+dOdevWTcuXL9cXX3yhAQMG6JNPPtG///1vr2NcKZeiRYtq7Nixevnll1WuXDlJ//celS9fXoULF1bevHnTFaN06dJq1qyZ/vzzT8XExGj37t3ukbALFiyoQoUKKTAwMM25pC6ziz/LpAvb0B133KHp06fr0KFDPo++fbl1rGnTpvrggw/05ptvKnfu3LrhhhvcMZOSkhQcHKzixYunOc5XX30l6cJnWKdOnXTixAl99dVXGjlypF577TVJUkhIiPLly+fTenzixAlJnreXu+mmm9S6dWv9/PPPmjp1qvbt2+deRpUrV1aBAgXSd8eKNJfqOdCmTZssNDTUunfv7nE6x5kzZ2zfvn22ZMkSj+sE4uPjrXnz5vbee++Zmfe/4MfExFhwcLAFBQVZVFSU+yjDvn37rE+fPla3bl0bO3asx2vatm3rHjXRmzh/j3HxdbsXt3H+/Hl7++237eabb7brr7/eChQo4NO1ENeKczkPPPCAe6CY9OZy8uRJu/vuu61BgwYWFhZmMTExlpycbJMmTbLQ0FCfruO+Vi4XH3lKSUmxhIQEa9mypU/X21wpRlxcnD3//PNWpEgRa9asmccpS48//rjdd999du7cuXSvY6mmTJniPvKUqk+fPhYdHe31ddwXx2jYsKH7aN/x48dt3bp19t5777lHdTW7cLpW48aN3aPXemvTpk1WpUoVq1GjhrlcLve6c/z4cZs6dapVrFjRHnnkEY/XdO7c2R555BGvR2O9XIyLX5f6f0pKii1fvtxuueUWy507txUoUMCn6xQvF+dyvzK/9957dtNNN5mZ2fDhwy1//vxeXzd4pRhXWw4PPfSQ9evXL13LKzk52RITE+2jjz6yyMhIq1Wrlsfo9P3797fo6GifBuzxdnmdP3/eoqOjrXnz5l63fbUYF1/S8/bbb1vNmjU9XjNw4EBr1KiRT2cDbd682Zo3b25NmjSxihUr2vTp0+2vv/6ypKQke/755618+fLpXo//HmPGjBm2e/du9/zUIwDpXY+vFSe1r+lZj68V43J8XY+vFScpKcmaNm1qFStWtLCwMPfZKB9++KGFhYV5fdnS5WJs377dPf/is8LMLqzP99xzjz333HNm5v2+xdVy+fjjj61MmTLp3i69fe/Ts03+Pca0adPcR5z279/vt32x1NG1GzZsaOXKlbOCBQta9+7d3Ue1e/XqZdWrV7eBAwfazp077fDhwzZ06FCrWLGi1/sWl4vx8MMPXzJeT6rZs2dbnjx5LCQkxOujad7EMLvwnVm0aFFbsWKFPfPMMxYUFORx+Z+/czG7MFBb/fr1vR5A60oxUo8qP/PMM+ZyuSwyMtK9PR4/ftymT59+yR1efI3Ts2fPq27TAwYMsBYtWnh91sblYvTo0cO9PzZ16lQLDw+3mjVr2qpVq2zz5s02atQoK1OmjE9nHfjyvuzdu9dq1aplY8aM8ekI95WW18Ux7r33Xhs8eLAdOHDAzpw5Y6NGjbJSpUr5dDnB5eI89NBDHvvjfz/rpGfPnta+fXuvBxn+7bffLDIy0n2k38zzzKNRo0ZZgwYNrHXr1hYTE2M7duywYcOGWenSpTml3B+OHTtmderUcY8UbXZhZLwrLdwzZ87Y8OHDLSIiwv744w+v46QO0z98+HD74osvrGzZsu7TY8wuXCM0dOhQK1eunDVv3twmTZpkPXv2tIIFC9rWrVvTHCN1FNeLPwxTv5ROnTpljRo1ssKFC/s0aIe3cVIlJyfbiBEjrHjx4h47G2mNcf78eTt8+LDddNNNVrVqVY9Tc8+ePevT6Z5Xi3O5gThSUlJs5MiRVqpUqWvuBHoTw+zCOjhjxgwLCwuzqlWrWrdu3ez++++3QoUK+e19ST39b+3atVaiRAnr1auXLVy40Pr162dFihTxejCVa63Hf3f+/HkbPny4lSpVyqfBhnbu3GlhYWH25JNP2h9//GHTpk0zl8vlvn7yyJEjNmPGDCtevLjVrl3bHn/8cevSpYvlz5/f62ugLhcjICDgkn6mrtNnzpyxu+66y0JDQ326zsrbOGYXirxmzZrZqFGjLG/evB7rtr9imF14X0aMGGHFihW7ZOAuX2K4XC73DsLp06dt8eLFVr58eStZsqQ1b97c2rVrZ4UKFfLp9Ehvc0ldp9evX28BAQHunW5/xdi7d6+FhoZaq1at7LXXXrPevXtbaGioT7ls377dihYtagMHDrSFCxfamDFj3JepbNy40c6dO2czZsywiIiINK/HV4rRvn17j+tFU3ey0roeexvHLO3rsS8xzNK2Hl8rTupnZteuXa1+/fqX9N3b6zi9iXGxpKQkGzFihEVERPhUQFwpTrt27dyn+y9evNgqVKiQ5u3S2/cl9bMyLdvk1fK43Huf1n2xkydPWlRUlPuSjQMHDthXX31loaGh1qxZM3eBP3bsWGvcuLF7lP/w8HBbv359umPccccdtnPnTvdzk5OTLSkpyfr372+FCxf2epv0JcbJkyetdu3a1qRJEwsKCvK4DMSfccwujCr/xBNPWOHChb1ev64Wo2XLlrZ//347f/68Pf744+5rlFP3Y30pHq8WJzo6+pLt7s8//7QhQ4ZYaGio1/tIV4vRvHlzd6H67rvv2h133GEul8uqVatm5cuX93r9ulacy70vycnJ1rJlS2vSpInXBaq3McaPH2/169e3YsWKWdOmTS0iIsJvubRo0eKSXGJiYmzAgAEWEhLi9T7ynj17rFatWlahQgWrXr26x8HNi5fHnDlzrFWrVuZyuax69epWunRpn3K5HAru/+/PP/+0qKgoO3LkiCUnJ1v79u3tlltusQIFCti//vUvj6H616xZY126dPHpg9fsQpGTO3du92ieKSkpVrVqVY9RsM0uFF7ffvuttWzZ0po2bWr33HOP1x9Y3sZIlbqjEhQUZDExMX7PJdWPP/5oPXr0sGLFinm9zLyNsXPnTq8LeH/k8sMPP1jXrl2tSJEifs/l3Llz9scff1jv3r3tgQcesN69e3vcTs1fceLi4mzWrFlWoUIFq1Klit1+++1ef5F4E+PiIwyrVq2ye+65x6f3PtWIESPs7rvv9pjWqlUr++mnn+ynn35y/xr9+++/20MPPWT33XefdevWzacfKK4UY8WKFfbzzz97/KBy7tw5mzRpkuXNm9fna3l8iTNz5kxzuVwWEhLi006RLzGWLVtmDzzwgBUvXtyn9+Va70nqkYjExEQbM2aMDRgwwIYNG+b1D4ZpySUlJcX27t1rHTt29OnXdG9jfPPNN1a7dm2rXbu23X333T6PJjxgwIBLxq3o3r27BQUFWbt27dzXCu7atcu6d++epvX4SjHy5ctnHTp08FiPkpKS0rwe+xJn1qxZaVqPfYnx/fffp2k9vlacdu3a2c6dO+3QoUM+FXK+5nJxIf/dd99Zhw4d0vRZea11LHUf4uzZszZu3Lg0bZe+vC9mlqZt8loxLj4iu3bt2jTti5ldKNTr1KlzyW2Stm3bZtdff73H50JsbKx99dVX9tNPP/n0g/G1YrRt29bj4MQvv/xiLpfLp6POvsQ4duyYlS5d2kJDQ33a3/M1zqpVq+xf//qX3XjjjT7F8eU9SQ9fcvn555+tZ8+eVrlyZZ8+K68Vo3Xr1u5pKSkptm7dOtuxY4fPdwzwJZfUH6f37Nlj27Zt81uMe+65xz1t0aJFNnnyZJs5c6bPA+VdK869997r/mHlxIkT9s4771jt2rW9fl9SUlJs8uTJduedd9rXX39to0ePtsqVK1+x6Da7cF33li1b/HIHDAru/2/jxo0WERFhW7dutXbt2ll0dLT95z//sddee81uv/12u/POOz2+GOfMmWM7duzwKcbw4cNt0KBBZvZ/vwC/++67FhkZ6R7F73KnQvlyqwNvYvzdhAkTfB6Yw5c4p0+fts8//9z69u3rU/HoTQx/DPrgay4fffSR9erVy6ejQml9733Nz9c4J0+etAMHDvg06mZa1rEpU6ZcMgCJNwYMGGCtWrVyn62QejpZ/fr1LSwszFq2bGk//PCDx2t8HTjjajHCw8MtOjraI8acOXP8nktqnNTRQ1evXm1RUVFpKuy8jbFq1SobPny4z4WwN++JP+4l6uv7Ynbpqbn+yOXiUa/j4+N9jmFm1qFDB/cAL6mnw40fP95atmxpFStWtKeeeuqS1/i6Hl8tRqVKlTx+IDNL+3rsbRyztK/H3sY4c+ZMmtfja8WpWLGiDR8+3MzSN+CbL7msWLHCBg4c6NP3pK9x0sPbGBd/b/m6vfiaR1r2xcwunNl3ww03eOxsp+5rbdy40QoUKGBjxozxuV1fYzzzzDMer/n75V/+jjFx4sQ0bSu+xlmxYsUlp7T7I0bqPbLTw9dcli1b5vNAaRmxfnkb5+Jc0vJZ5k0MfwyM5msup0+f9nl7OXDggM2dO9fMLvyQllp0X/xepOf2cldDwW0Xvhz27Nlj1atXt+nTp1vHjh09vvC+//57q1q1qvs+j2l1uR2o7du3W0REhI0fP97MLn96jC8biC8x0rMT4UscswsrsC/XbqYlRlr5GicxMdHnnYi0vi++5pcRyyyt63FazJgxwwoUKGAdOnSwLl26WJ48eeyTTz6xU6dO2cqVK61x48Y2bNgwj2s2fc3P2xjpXW7exjG7sKPp6xeJrzGSk5N9HpHelxgpKSnu9z8ty87XXNLiWjEaNWrkXr/SY9CgQVa8eHH3tX8HDhywwoUL2zfffGMzZsywfPnyXXLkzNdldq0Y+fPnT9e9o9MS59SpU2laj72JcfE1jmlZj33NJa18yeXvt09zOpesuI5l1HpsduE2cyVKlLAvvvjCPS11+Y8fP94aNGhgR48eTdf2722M1HU4LZ+V3sT4+x1e0sKbON5eq52eGEePHk3393FGLLOslEt643gTI/UMYbP03frRmzj+unvL/v37L1t0f/bZZ365neHFKLgvMnDgQHO5XJY3b95LTulp06aNdezY0S9x/v7hPWHCBCtatGiafnXMzBgZFYdcsmacjMrl1VdftUmTJlmHDh3sscce85jXvXt3a9y4cboLomvFaNSokV/OpvAmTlqLB19ipPeLJCstr+zw3v/555928803W2BgoN1xxx2WP39+9wBpR44csRtuuOGy1/NmtRjkkjVj5KRcnIqxf/9+W716tS1evNj9+bd792677777rHHjxrZkyRKP58+cOdOqVKni9SCiOSkGuZBLTs7FzDwO0uzbt89ddI8ePdpdC158e2B/8O5+AznMtm3bNHfuXP3111+68cYbdfvtt6tu3bp68cUXFR8frzlz5mjp0qWqUKGCQkJCJEn58+dXpUqV0hWnefPmqlWrlgICApSSkuIebr5Zs2Z655139NNPP6ly5cru2xBllRjkQi6ZkUuTJk1Ur1499e3bV5I0cOBA5cuXT9KFW06k3vKiWrVqHv1wIkb16tW9jpHeON5KTwzz8vYZ2WF5ZfX3vmXLlqpZs6aWLFmiadOmKSUlRV27dlWXLl0kSXv27FH+/Pnd3zVZJQa5kEtOiCFJmzZt0j333KPAwEDFxsYqPDxcY8aMUfv27TV06FCNHTtWI0aM0LFjx9S5c2edP39ev//+u4oVK+ZxG7d/QgxyIZecmkvx4sU1atQoRUdHKzQ01H3ruoiICPft8caNG6dChQppzZo1ioiI8Dofr/i1fM8GtmzZYoUKFbL77rvPHnvsMStZsqTVrl3bpk+fbmZmR48etS5dulju3Lmtb9++NnnyZBs0aJCFhob6dL3b5eLUqVPHZsyY4X7Oxb+2dO3a1SIjI9Odi79jkAu5ZGYu06ZNcz9n3LhxVqBAAVu+fLn9/PPPNnr0aAsNDfX6eseMiJGTcmF5+SeXWrVq2cyZM93P+fvR8qFDh1qtWrW8PhUzI2KQC7nkhBhmZocOHbLKlSvbU089Zbt27bJ9+/ZZp06drGLFijZ27Fg7e/asxcTE2GOPPWa5c+e2G2+80Ro2bGiFCxf2ejCmnBKDXMglp+dSpUoVGz16tPtygYtPTX/wwQctODg4TWNoeOMfVXCfPHnSoqOjbejQoe5pf/31lxUpUsSKFStmEyZMcE+fMmWKRUdHW61atezuu+/2aaTFq8UJCwuzZ5991j099fTRZcuWWY0aNbweZCIjYpALuWSFXFIHyUhOTrZOnTpZQECAVaxY0WrVquX1dpkRMXJSLiwv53JJtXz5cuvXr59dd911Xu9EZEQMciGXnBAj1ZYtW6xMmTKXjKD+5JNPWrVq1Wzq1KmWkpLiHrfhmWeesZkzZ/o0EFtOiUEu5PJPyKVGjRo2ZcoUj1PT33zzTStUqFC6b/11Nf+ogjshIcHq169vCxYscD82M7vvvvusWbNmFhUVZf/97389np+WAbKuFefmm2+2L7/80uM18fHxPt0OICNikAu5ZIVcoqKiPOIsX77cNm/e7Ndc/BEjJ+XC8vJ/Ln/fXn766Sd7/PHHfbrbQUbEIBdyyQkxUsXExFiJEiXcd9C4eH+uf//+Vrp0aZ/v0pJTY2RUHHLJmnH+SblERkZ6xDh48KDPtzHz1T+m4E5JSbHY2FiLiIiw5557zj197969VrVqVZs3b57VrFnTHn74YY/XZMU45EIu/7RcevXq5XPbGRkjJ+XC8nIuzsXbi5n5dOeGjIhBLuSSE2L8Xf369e322293Pz579qz7/3r16l1y3+9/coyMikMuWTPOPzEXf49GfiU5vuD++4J87bXXzOVyWc+ePW3EiBFWsGBB9wiYCxcutDJlyngMbZ+V4pALuZALuWTnGORyIY4vI9FnRAxyIZecEMPswq3o4uPjLS4uzj1t/fr1VqxYMbv//vvd01LbHTx4sLVu3fofGYNcyIVcMk6OLri3bdtmU6dO9bieNDk52ebOnWv169e3O+64wyZPnuye9+qrr1rt2rV9PlKXEXHIhVzIhVyycwxyIRdy+WfmklHLa8uWLdayZUurXbu2RURE2LvvvmtmF46Ov/fee3b99ddbhw4d7Ny5c+4f17p27WqdO3e28+fPexUvp8QgF3IhF99ySa8cW3Dv2LHDQkNDzeVy2fDhwy8Z1fLMmTMepxeYmfXt29c6dOhgZ86c8foNyIg45EIu5EIu2TkGuZALufwzc8mo5bVlyxYrUqSIDRo0yObPn2+DBw+2PHnyuAdBSkhIsP/85z9WokQJq1y5srVt29Y6duxoBQoUsM2bN/+jYpALuZCLb7n4Q44suE+dOmU9e/a07t2727Rp08zlctkTTzzh8UF/8Yf41q1bbeDAgXbdddfZpk2bslQcciEXciGX7ByDXMiFXP6ZuWTU8jp69Ki1bNnS+vfv7zG9SZMm1q9fP49p8fHxNnToUHv44Yetb9++Xt8CKKfEIBdyIRffcvGX3P69q3fWEBAQoLp166pIkSLq1KmTrr/+enXu3FmSNHToUF1//fVyuVySpJMnT+qbb77Rhg0btHz5ctWoUSNLxSEXciEXcsnOMciFXMjln5lLRi2v8+fP68SJE+rQoYMkKSUlRQEBAYqMjNSxY8ckSXbhAJOuu+46TZ482eN5/6QY5EIu5OJbLn6TkdV9Rjp16pTH4/fff99cLpcNGTLEjhw5YmYXBvGIjY218+fP27Fjx7JsHHIhF3LJWjFyUi4sL3Ihl6wVIyflklHLa/v27e7/z507Z2ZmI0aMsAcffNDjeRcPquTrtZs5JUZGxSEXcnE6Tkbl4g85tuBOlZSU5F647733nvuUpn379tmgQYOsbdu2Pt9nO7PikEvWjEMuWTMOuWS9GOSSNWOQS9aMkZNyyajldfGdBp5++mmLjo52P54wYYI9//zzPo98nlNjZFQccsmaccglY+X4gtvswq8ZqW/G+++/b3ny5LFKlSpZ7ty53RfWZ5c45JI145BL1oxDLlkvRkbFIZesGYdcsmacnBIjNY7ZhR3vVq1amZnZyJEjzeVyWUxMDDEyIQ65ZM045JJx/hEFt9mFNyL1zWjatKmFhob6NChHVopDLlkzDrlkzTjkkvViZFQccsmaccgla8bJKTFSi/rRo0fbo48+as8995wFBgbaunXriJFJccgla8Yhl4zzjym4zS6c0jRo0CBzuVy2cePGbB2HXLJmHHLJmnHIJevFyKg45JI145BL1oyTU2KYmY0fP95cLpeFhITYmjVriJEF4pBL1oxDLs7LhGHaMle1atW0fv161axZM9vHIZesGYdcsmYccsl6MTIqDrlkzTjkkjXj5JQY0dHRkqSff/5Z9erVI0YWiEMuWTMOuTjPZWaW2Z3ISGbmvg1Fdo9DLlkzDrlkzTjkkvViZFQccsmaccgla8bJKTEkKSEhQQUKFCBGFopDLlkzDrk46x9XcAMAAAAAkBH+caeUAwAAAACQESi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAcgAzU/PmzRUdHX3JvOnTp6tQoUL666+/MqFnAAD8c1FwAwCQA7hcLs2ZM0erV6/WrFmz3NN3796toUOH6tVXX1WJEiX8GvP8+fN+bQ8AgJyGghsAgByiZMmSevnllzVkyBDt3r1bZqZevXqpZcuWql27tlq1aqWCBQsqLCxMDz74oI4cOeJ+7eLFi9WoUSMVKlRIRYoU0d13361du3a55//xxx9yuVz64IMPdNtttykoKEjz58/PjDQBAMg2XGZmmd0JAADgP23btlVcXJzatWunZ555Rlu2bFG1atX08MMPq1u3bjpz5oyefPJJJSUl6bvvvpMkffzxx3K5XKpZs6ZOnTqlUaNG6Y8//lBMTIwCAgL0xx9/KDIyUmXKlNHzzz+v2rVrKygoSMWLF8/kbAEAyLoouAEAyGEOHTqkatWq6dixY/r444/166+/6scff9SSJUvcz/nrr79UsmRJbdu2TRUrVrykjSNHjqho0aLavHmzqlev7i64X3rpJQ0YMCAj0wEAINvilHIAAHKYYsWKqXfv3qpSpYratm2rjRs3atmyZSpYsKD7r3LlypLkPm18x44duv/++1W2bFkFBwerTJkykqQ9e/Z4tF2vXr0MzQUAgOwsd2Z3AAAA+F/u3LmVO/eFr/lTp06pdevWmjx58iXPSz0lvHXr1ipdurTeeOMNRUREKCUlRdWrV9e5c+c8nl+gQAHnOw8AQA5BwQ0AQA5Xp04dffzxxypTpoy7CL/Y0aNHtW3bNr3xxhtq3LixJOmnn37K6G4CAJDjcEo5AAA5XJ8+fXTs2DHdf//9WrNmjXbt2qUlS5aoR48eSk5OVuHChVWkSBG9/vrr2rlzp7777jsNHjw4s7sNAEC2R8ENAEAOFxERoRUrVig5OVktW7ZUjRo1NHDgQBUqVEgBAQEKCAjQ+++/r3Xr1ql69eoaNGiQnnvuuczuNgAA2R6jlAMAAAAA4ACOcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwwP8DjQRmYtkRq5kAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Create a barplot plot with y label as \"number of titles played\" and x -axis year\n",
"plt.figure(figsize = (10, 6))\n",
"sn.barplot(x=songs_in_a_year.index, y=songs_in_a_year.values, palette=\"viridis\")\n",
"\n",
"\n",
"plt.title(\"Number of Titles Played Per Year\")\n",
"plt.ylabel(\"Number of Titles Played\")\n",
"plt.xlabel(\"Year\")\n",
"plt.xticks(rotation=45)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9VThYg7voGIz"
},
"source": [
"## **Building various models**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ituk9wA4Idib"
},
"source": [
"### Popularity-Based Recommendation Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "462hsbxaI1ED"
},
"source": [
"Let's take the count and sum of play counts of the songs and build the popularity recommendation systems based on the sum of play counts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "UXhBZlDE-jEu",
"outputId": "0ad06a6e-f92f-4217-b54a-9c99ed1041de"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"song_id\n",
"0 3.439394\n",
"2 2.358209\n",
"4 2.131579\n",
"5 5.607692\n",
"6 1.920455\n",
" ... \n",
"9993 1.943396\n",
"9994 4.396947\n",
"9996 1.833333\n",
"9997 2.731092\n",
"9998 1.526316\n",
"Name: play_count, Length: 8227, dtype: float64\n",
"song_id\n",
"0 66\n",
"2 67\n",
"4 190\n",
"5 130\n",
"6 88\n",
" ... \n",
"9993 106\n",
"9994 131\n",
"9996 138\n",
"9997 119\n",
"9998 76\n",
"Name: play_count, Length: 8227, dtype: int64\n"
]
}
],
"source": [
"# Calculating average play_count\n",
"play_count = df_final.groupby(\"song_id\")['play_count'].mean()\n",
"\n",
"# Calculating the frequency a song is played\n",
"play_freq = df_final.groupby(\"song_id\")['play_count'].size()\n",
"\n",
"print(play_count)\n",
"print(play_freq)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "v2XYdXvWdyys",
"outputId": "6d408dd3-96c2-4228-e560-626022621824"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" <div id=\"df-6ad493f9-e157-41c8-b1a7-77d93b972ea3\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>average_play</th>\n",
" <th>song_frequency</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>18.158228</td>\n",
" <td>158</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>16.891892</td>\n",
" <td>148</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>16.037500</td>\n",
" <td>80</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.976000</td>\n",
" <td>125</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.171756</td>\n",
" <td>262</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-6ad493f9-e157-41c8-b1a7-77d93b972ea3')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-6ad493f9-e157-41c8-b1a7-77d93b972ea3 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-6ad493f9-e157-41c8-b1a7-77d93b972ea3');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-68e671e1-b2c8-4a50-9991-84d27e77df98\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-68e671e1-b2c8-4a50-9991-84d27e77df98')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-68e671e1-b2c8-4a50-9991-84d27e77df98 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"text/plain": [
" average_play song_frequency\n",
"0 18.158228 158\n",
"1 16.891892 148\n",
"2 16.037500 80\n",
"3 15.976000 125\n",
"4 15.171756 262"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Making a dataframe with the average_count and play_freq\n",
"combine_df = pd.DataFrame({\n",
" 'average_play': play_count,\n",
" 'song_frequency': play_freq,\n",
"}).sort_values(by='average_play', ascending=False).reset_index(drop=True)\n",
"\n",
"combine_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WnCT-A7RK_5g"
},
"source": [
"Now, let's create a function to find the top n songs for a recommendation based on the average play count of song. We can also add a threshold for a minimum number of playcounts for a song to be considered for recommendation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "QiT9FV3GNCrb"
},
"outputs": [],
"source": [
"# Build the function to find top n songs\n",
"# Filter out songs with play counts no less than the 15 (threshold = 15)\n",
"def recomendation(df,n,threshold=15):\n",
" recom_songs = play_count[play_count>=threshold]\n",
" top_songs=recom_songs.sort_values(ascending=False).head(n)\n",
"\n",
" return top_songs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "GpZt_BeXgz4F",
"outputId": "022c0521-cbdf-4469-d1c4-131e0ef6da00"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"song_id\n",
"32 18.158228\n",
"1990 16.891892\n",
"7839 16.037500\n",
"9859 15.976000\n",
"3859 15.171756\n",
"Name: play_count, dtype: float64\n"
]
}
],
"source": [
"# Recommend top 10 songs using the function defined above\n",
"top_10 = recomendation(df, 10)\n",
"print(top_10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gf13HrPPJeWT"
},
"source": [
"### User Similarity-Based Collaborative Filtering"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZhFa_4aHHchr"
},
"source": [
"Below is the function to calculate precision@k and recall@k, RMSE, and F1_Score@k to evaluate the model performance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Rxn-GahOTsnm"
},
"outputs": [],
"source": [
"# The function to calulate the RMSE, precision@k, recall@k, and F_1 score\n",
"def precision_recall_at_k(model, k = 30, threshold = 1.5):\n",
" \"\"\"Return precision and recall at k metrics for each user\"\"\"\n",
"\n",
" # First map the predictions to each user.\n",
" user_est_true = defaultdict(list)\n",
"\n",
" # Making predictions on the test data\n",
" predictions=model.test(testset)\n",
"\n",
" for uid, _, true_r, est, _ in predictions:\n",
" user_est_true[uid].append((est, true_r))\n",
"\n",
" precisions = dict()\n",
" recalls = dict()\n",
" for uid, user_ratings in user_est_true.items():\n",
"\n",
" # Sort user ratings by estimated value\n",
" user_ratings.sort(key = lambda x : x[0], reverse = True)\n",
"\n",
" # Number of relevant items\n",
" n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)\n",
"\n",
" # Number of recommended items in top k\n",
" n_rec_k = sum((est >= threshold) for (est, _) in user_ratings[ : k])\n",
"\n",
" # Number of relevant and recommended items in top k\n",
" n_rel_and_rec_k = sum(((true_r >= threshold) and (est >= threshold))\n",
" for (est, true_r) in user_ratings[ : k])\n",
"\n",
" # Precision@K: Proportion of recommended items that are relevant\n",
" # When n_rec_k is 0, Precision is undefined. We here set Precision to 0 when n_rec_k is 0\n",
"\n",
" precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 0\n",
"\n",
" # Recall@K: Proportion of relevant items that are recommended\n",
" # When n_rel is 0, Recall is undefined. We here set Recall to 0 when n_rel is 0\n",
"\n",
" recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 0\n",
"\n",
" # Mean of all the predicted precisions are calculated\n",
" precision = round((sum(prec for prec in precisions.values()) / len(precisions)), 3)\n",
"\n",
" # Mean of all the predicted recalls are calculated\n",
" recall = round((sum(rec for rec in recalls.values()) / len(recalls)), 3)\n",
"\n",
" accuracy.rmse(predictions)\n",
"\n",
" # Command to print the overall precision\n",
" print('Precision: ', precision)\n",
"\n",
" # Command to print the overall recall\n",
" print('Recall: ', recall)\n",
"\n",
" # Formula to compute the F-1 score\n",
" print('F_1 score: ', round((2 * precision * recall) / (precision + recall), 3))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rGfYDiOCpe4X"
},
"outputs": [],
"source": [
"\n",
"# Instantiating Reader scale with rating scale (0, 5)\n",
"reader = Reader(rating_scale=(0,5))\n",
"\n",
"# Loading the dataset\n",
"data = Dataset.load_from_df(df[[\"user_id\",\"song_id\",\"play_count\"]], reader)\n",
"\n",
"# Splitting the data into train and test dataset\n",
"trainset, testset = train_test_split(data, test_size=0.4, random_state=42)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "vO3FL7iape8A",
"outputId": "08987b7b-c4b1-405a-bd96-73a898f2f409",
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"RMSE: 6.7604\n",
"Precision: 0.44\n",
"Recall: 0.643\n",
"F_1 score: 0.522\n"
]
}
],
"source": [
"# Build the default user-user-similarity model\n",
"sim_options = {\n",
" 'name': 'cosine',\n",
" 'user_based': True # Compute similarities between users\n",
"}\n",
"\n",
"# KNN algorithm is used to find desired similar items\n",
"sim_user_user = KNNBasic(sim_options=sim_options)\n",
"\n",
"# Train the algorithm on the trainset, and predict play_count for the testset\n",
"sim_user_user.fit(trainset)\n",
"predictions = sim_user_user.test(testset)\n",
"\n",
"# Let us compute precision@k, recall@k, and f_1 score with k = 30\n",
"precision_recall_at_k(sim_user_user, k=30)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Sxd23bZ9pe_x",
"outputId": "e3b484c7-3907-471e-d2c9-cf06d41cfd66"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Predicted play_count for user 27018 with listened song 198: 2.32530514540775\n"
]
}
],
"source": [
"# Predicting play_count for a sample user with a listened song\n",
"\n",
"sample_user = df['user_id'].iloc[0]\n",
"sample_song_id_listened = df[df['user_id'] == sample_user]['song_id'].iloc[0]\n",
"\n",
"predicted_play_counts_listened = sim_user_user.predict(sample_user, sample_song_id_listened)\n",
"print(f\"Predicted play_count for user {sample_user} with listened song {sample_song_id_listened}: {predicted_play_counts_listened.est}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "PbFcBj1PpfEV",
"outputId": "7a9e2029-f6d5-4759-93c4-cecf399e9f48"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Predicted play_count for user 27018 with not-listened song 352: 4.75\n"
]
}
],
"source": [
"# Predicting play_count for a sample user with a song not-listened by the user\n",
"sample_song_id_not_listened = df[df['user_id'] != sample_user]['song_id'].iloc[0]\n",
"prediction = sim_user_user.predict(sample_user, sample_song_id_not_listened)\n",
"print(f\"Predicted play_count for user {sample_user} with not-listened song {sample_song_id_not_listened}: {prediction.est}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Lt1QBiylsIOm"
},
"source": [
"Now, I try to tune the model and see if we can improve the model performance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "T3diJPL7-tVw",
"outputId": "d1c3661c-2703-413f-e060-69a705238d35"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Best RMSE score: 5.257865307327809\n",
"Best parameters: {'k': 50, 'sim_options': {'name': 'cosine', 'min_support': 1, 'user_based': True}}\n"
]
}
],
"source": [
"# Setting up parameter grid to tune the hyperparameters\n",
"param_grid = {\n",
" 'k':[10,30,50],\n",
" 'sim_options':{\n",
" 'name': ['cosine','msd','pearson'],\n",
" 'min_support':[1,5],\n",
" 'user_based':[True]\n",
" }\n",
"}\n",
"\n",
"# Performing 3-fold cross-validation to tune the hyperparameters\n",
"gs = GridSearchCV(KNNBasic, param_grid, measures=['rmse','mae'],cv=3)\n",
"\n",
"# Fitting the data\n",
"gs.fit(data)\n",
"\n",
"# Extract the best RMSE score and parameters.\n",
"print('Best RMSE score: ', gs.best_score['rmse'])\n",
"print('Best parameters: ', gs.best_params['rmse'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "PujRJA8X_JEJ",
"outputId": "70d847e9-7810-4fa6-c391-154464727741"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n"
]
},
{
"data": {
"text/plain": [
"<surprise.prediction_algorithms.knns.KNNBasic at 0x7f5408d07430>"
]
},
"execution_count": 138,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from surprise import KNNBasic\n",
"best_params = gs.best_params['rmse']\n",
"k = best_params['k']\n",
"sim_options = best_params['sim_options']\n",
"\n",
"model = KNNBasic(k=k, sim_options=sim_options)\n",
"\n",
"# Train the best model found in above gridsearch\n",
"trainset = data.build_full_trainset()\n",
"model.fit(trainset)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HXO2Ztjhq1bN",
"outputId": "381879e4-1e4b-471e-8500-59d560f6a250"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"User 27018 would rate the 198 as : 2.51\n"
]
}
],
"source": [
"# Predict the play count for a song that is not listened to by the user (with user_id 27018)\n",
"user_id = '27018'\n",
"\n",
"not_listened_songs = df[(df['user_id'] != user_id)]['song_id'].unique()\n",
"\n",
"song_not_listened = not_listened_songs[0]\n",
"\n",
"prediction = model.predict(user_id, song_not_listened)\n",
"\n",
"print(f'User {user_id} would rate the {song_not_listened} as : {prediction.est:.2f}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "U3ESobDynVNI"
},
"source": [
"Below I implemented a function where the input parameters are:\n",
"\n",
"- data: A **song** dataset\n",
"- user_id: A user-id **against which we want the recommendations**\n",
"- top_n: The **number of songs we want to recommend**\n",
"- algo: The algorithm we want to use **for predicting the play_count**\n",
"- The output of the function is a **set of top_n items** recommended for the given user_id based on the given algorithm"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "vW9V1Tk65HlY"
},
"outputs": [],
"source": [
"def get_recommendations(data, user_id, top_n, algo):\n",
"\n",
" # Creating an empty list to store the recommended product ids\n",
" all_songs = data['song_id'].unique()\n",
"\n",
" # Creating an user item interactions matrix\n",
" listened_songs = data[data['user_id']==user_id]['song_id'].unique()\n",
"\n",
" # Extracting those business ids which the user_id has not visited yet\n",
" listened_songs = set(data[data['user_id'] == user_id]['song_id'])\n",
" all_songs = set(data['song_id'])\n",
"\n",
" # Looping through each of the business ids which user_id has not interacted yet\n",
" not_listened_songs = all_songs - listened_songs\n",
"\n",
" # Predicting the ratings for those non visited restaurant ids by this user\n",
" predictions = [algo.predict(user_id, song_id) for song_id in not_listened_songs]\n",
" sorted_predictions = [algo.predict(user_id, song_id) for song_id in not_listened_songs]\n",
" top_recommendations = [(pred.iid,pred.est) for pred in sorted_predictions[:top_n]]\n",
"\n",
" return top_recommendations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "qWbR85mI5Hrk",
"outputId": "dfbef352-d304-4d2e-b21e-4a98fd7d0056"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Recommended Songs: [(2048, 2.0), (6148, 1.517029070291005), (21, 3.08), (22, 1.560053445618945), (6175, 2.8), (8224, 1.1791884952890748), (4134, 2.6832539838589686), (2091, 2.8), (6189, 2.7), (6191, 1.729859340901254)]\n"
]
}
],
"source": [
"# Make top 10 recommendations for any user_id with a similarity-based recommendation engine\n",
"top_n = 10\n",
"top_recommendations = get_recommendations(df, 6958, top_n, model)\n",
"print(\"Recommended Songs:\", top_recommendations)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QgbzJKk7Tsnr"
},
"source": [
"### Item Item Similarity-based collaborative filtering recommendation systems"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "W5RMcdzjTsns",
"outputId": "625a441b-5369-4bb3-f60d-7944676a3bc1",
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"RMSE: 12.4757\n",
"Item-item collaborative filtering model rmse:12.475676611567863\n"
]
}
],
"source": [
"# Apply the item-item similarity collaborative filtering model with random_state = 1 and evaluate the model performance\n",
"reader = Reader(rating_scale=(df['play_count'].min(),df['play_count'].max()))\n",
"data = Dataset.load_from_df(df[['user_id', 'song_id', 'play_count']], reader)\n",
"\n",
"trainset, testset = train_test_split(data, test_size=0.25, random_state=1)\n",
"\n",
"sim_options = {\n",
" \"name\":\"cosine\",\n",
" \"user_base\": False,\n",
"}\n",
"algo = KNNWithMeans (sim_option=sim_options)\n",
"algo.fit(trainset)\n",
"\n",
"test_pred = algo.test(testset)\n",
"rmse = accuracy.rmse(test_pred)\n",
"\n",
"print(f\"Item-item collaborative filtering model rmse:{rmse}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5yILOxXRTsns",
"outputId": "cea1677b-3c88-4a34-8440-34b876337917"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Predicted play count for user 6958 and song 1671: 12.052177616867214\n"
]
}
],
"source": [
"# Predicting play count for a sample user_id 6958 and song (with song_id 1671) heard by the user\n",
"user_id = 6958\n",
"song_id = 1671\n",
"prediction = algo.predict(user_id, song_id)\n",
"print(f\"Predicted play count for user {user_id} and song {song_id}: {prediction.est}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "f5bcZ3HgTsnt",
"outputId": "0eaefb85-891e-45f1-8d95-44901579d911"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the msd similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Computing the pearson similarity matrix...\n",
"Done computing similarity matrix.\n",
"Best RMSE score: 11.862561113768047\n",
"Best parameters: {'k': 50, 'min_k': 10, 'sim_options': {'name': 'cosine', 'user_based': False}}\n"
]
}
],
"source": [
"# Apply grid search for enhancing model performance\n",
"param_grid = {\n",
" 'k': [10,30,50],\n",
" 'min_k': [1, 5, 10],\n",
" 'sim_options': {\n",
" 'name': ['msd', 'cosine', 'pearson'],\n",
" 'user_based': [False]\n",
" }\n",
"}\n",
"\n",
"# Performing 3-fold cross-validation to tune the hyperparameters\n",
"gs = GridSearchCV(KNNWithMeans, param_grid, measures=['rmse','mae'], cv=3)\n",
"\n",
"# Fitting the data\n",
"gs.fit(data)\n",
"\n",
"# Find the best RMSE score\n",
"print('Best RMSE score: ', gs.best_score['rmse'])\n",
"print('Best parameters: ', gs.best_params['rmse'])\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "dSeiM1qeTsnt",
"outputId": "89f16dc6-26df-4baf-9e80-150619f65afd"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Computing the cosine similarity matrix...\n",
"Done computing similarity matrix.\n"
]
},
{
"data": {
"text/plain": [
"<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f5408e23970>"
]
},
"execution_count": 176,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Apply the best modle found in the grid search\n",
"best_algo = KNNWithMeans(k=best_params['k'],\n",
" sim_options=best_params['sim_options'])\n",
"\n",
"# Train the best model on the full dataset\n",
"trainset = data.build_full_trainset()\n",
"best_algo.fit(trainset)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gIBRRvdoTsnt",
"outputId": "924aa5a8-f2d8-443f-abbc-eda614171ab7"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Estimated play_count for user 6958 on song 1671: 11.927327902890912\n"
]
}
],
"source": [
"# Predict the play_count by a user(user_id 6958) for the song (song_id 1671)\n",
"# Assuming user_id is '6958' and song_id is '1671'\n",
"user_id = '6958'\n",
"song_id = '1671'\n",
"\n",
"# Make a prediction\n",
"prediction = best_algo.predict(user_id, song_id)\n",
"\n",
"# The estimated play_count\n",
"print(\"Estimated play_count for user {} on song {}: {}\".format(user_id, song_id, prediction.est))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "7gewfmTATsnv",
"outputId": "b95691c4-fc6f-4a00-be1d-245448e65275"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Rorol', 'Auto-Dub', 'Hilarious Movie Of The 90s', 'One Minute To Midnight', 'Lights & Music', 'Neon Knights', 'Hold The Ladder', \"Ghosts 'n' Stuff (Original Instrumental Mix)\", 'Feel The Love', \"She's Good For Business\"]\n"
]
}
],
"source": [
"def ranking_songs(user_id, all_songs, best_algo, top_n=10):\n",
" # List to store predictions\n",
" predictions = []\n",
"\n",
" # Generate predictions for each song\n",
" for song_id in all_songs:\n",
" prediction = best_algo.predict(user_id, song_id)\n",
" predictions.append((song_id, prediction.est))\n",
"\n",
" # Sort the predictions in descending order of estimated play counts\n",
" predictions.sort(key=lambda x: x[1], reverse=True)\n",
"\n",
" # Return the top N song IDs\n",
" return [song_id for song_id, _ in predictions[:top_n]]\n",
"\n",
"all_songs = df['title'].unique()\n",
"top_songs = ranking_songs(user_id, all_songs, best_algo, top_n=10)\n",
"\n",
"print(top_songs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rKgJpSA9vOOL"
},
"source": [
"### Model Based Collaborative Filtering - Matrix Factorization"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hJynidJCw-ti"
},
"source": [
"Model-based Collaborative Filtering is a **personalized recommendation system**, the recommendations are based on the past behavior of the user and it is not dependent on any additional information. We use **latent features** to find recommendations for each user."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "07-2PT5Ssjqm",
"outputId": "d76b9d6f-8f01-44b2-c51e-5f4f5faaa690"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape of latent matrix: (76353, 20)\n"
]
}
],
"source": [
"n_latent_features = 20\n",
"svd = TruncatedSVD(n_components=n_latent_features)\n",
"user_song_interactions = df_counts.pivot(index='user_id', columns='song_id', values='play_count').fillna(0)\n",
"user_song_interactions_matrix = csr_matrix(user_song_interactions.values)\n",
"\n",
"\n",
"latent_matrix = svd.fit_transform(user_song_interactions_matrix)\n",
"\n",
"print(f\"Shape of latent matrix: {latent_matrix.shape}\")\n",
"\n",
"# Build baseline model using svd\n",
"\n",
"user_song_interactions_matrix = csr_matrix(user_song_interactions.values)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "yWIhfdxXsjqm",
"outputId": "a3eb8322-7828-4f9e-a0b1-33d929ef3947"
},
"outputs": [
{
"data": {
"text/plain": [
"<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7c594b027b80>"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Making prediction for user (with user_id 6958) to song (with song_id 1671), take r_ui = 2\n",
"reader = Reader(rating_scale=(0,5))\n",
"data = Dataset.load_from_df(df_counts[['user_id', 'song_id', 'play_count']], reader)\n",
"\n",
"# Split the data into training and testing sets\n",
"trainset = data.build_full_trainset()\n",
"\n",
"trainset = data.build_full_trainset()\n",
"svd = SVD()\n",
"svd.fit(trainset)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "APm-uMSvcAMf",
"outputId": "9550acf4-67e2-4495-9683-370db03ccbbd"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Predicted rating: 3.0454845\n"
]
}
],
"source": [
"user_id = 6958\n",
"song_id = 1671\n",
"r_ui = 2 # This is the true rating\n",
"\n",
"# Making prediction\n",
"prediction = svd.predict(user_id, song_id, r_ui=r_ui)\n",
"predicted_rating = prediction.est\n",
"print(f'Predicted rating: {predicted_rating}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "23tnRUJJxWTR"
},
"source": [
"#### Improving matrix factorization based recommendation system by tuning its hyperparameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4bM81V_hvtwv",
"outputId": "67eacbbc-4ff9-4e70-bd00-29beb3c9e49e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"6.860128766339876\n",
"{'n_epochs': 5, 'lr_all': 0.005, 'reg_all': 0.4}\n"
]
}
],
"source": [
"# Set the parameter space to tune\n",
"param_grid = {\n",
" 'n_epochs': [5, 10],\n",
" 'lr_all': [0.005, 0.01],\n",
" 'reg_all': [0.4]\n",
"}\n",
"\n",
"gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)\n",
"gs.fit(data)\n",
"\n",
"# Best RMSE score\n",
"print(gs.best_score['rmse'])\n",
"\n",
"# Combination of parameters that gave the best RMSE score\n",
"print(gs.best_params['rmse'])\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "TA_7xe-nnhuu"
},
"outputs": [],
"source": [
"# Building the optimized SVD model using optimal hyperparameters\n",
"optimal_params = gs.best_params['rmse']\n",
"optimized_svd = SVD(n_epochs=optimal_params['n_epochs'],\n",
" lr_all=optimal_params['lr_all'],\n",
" reg_all=optimal_params['reg_all'])\n",
"optimized_svd.fit(trainset)\n",
"trainset = data.build_full_trainset()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1LGeE2EB_n90"
},
"outputs": [],
"source": [
"# Making prediction\n",
"all_items = df['song_id'].unique().tolist()\n",
"predictions = [optimized_svd.predict(user_id, iid) for iid in all_items]\n",
"\n",
"# Getting top 10 recommendations for random user_id using \"svd_optimized\" algorithm\n",
"def get_top_n(predictions, n=10, user_id=6958):\n",
" top_n = {}\n",
" for prediction in predictions:\n",
" uid, iid, est = prediction.uid, prediction.iid, prediction.est\n",
" if uid == user_id:\n",
" top_n.setdefault(uid, []).append((iid, est))\n",
"\n",
" for uid, user_ratings in top_n.items():\n",
" user_ratings.sort(key=lambda x: x[1], reverse=True)\n",
" top_n[uid] = user_ratings[:n]\n",
"\n",
" return top_n\n",
"\n",
"top_n_recommendations = get_top_n(predictions, n=10, user_id=6958)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6ngiGSJU818M",
"outputId": "03d57719-8e00-4af4-e054-e2b3276ce39b"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Top 10 song recommendations for user 6958, ranked by predicted rating:\n",
"Rank 1: Song ID 198, Predicted Rating: 3.05\n",
"Rank 2: Song ID 248, Predicted Rating: 3.05\n",
"Rank 3: Song ID 318, Predicted Rating: 3.05\n",
"Rank 4: Song ID 926, Predicted Rating: 3.05\n",
"Rank 5: Song ID 1262, Predicted Rating: 3.05\n",
"Rank 6: Song ID 1472, Predicted Rating: 3.05\n",
"Rank 7: Song ID 1571, Predicted Rating: 3.05\n",
"Rank 8: Song ID 1811, Predicted Rating: 3.05\n",
"Rank 9: Song ID 1936, Predicted Rating: 3.05\n",
"Rank 10: Song ID 2615, Predicted Rating: 3.05\n"
]
}
],
"source": [
"# Ranking songs based on above recommendations\n",
"print(f\"Top 10 song recommendations for user {user_id}, ranked by predicted rating:\")\n",
"for rank, (song_id, predicted_rating) in enumerate(top_n_recommendations.get(user_id, []), start=1):\n",
" print(f'Rank {rank}: Song ID {song_id}, Predicted Rating: {predicted_rating:.2f}')\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "57b31de5"
},
"source": [
"### Cluster Based Recommendation System"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9Xv2AZCszCdN"
},
"source": [
"In **clustering-based recommendation systems**, we explore the **similarities and differences** in people's tastes in songs based on how they rate different songs. We cluster similar users together and recommend songs to a user based on play_counts from other users in the same cluster."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0c4b20e4",
"outputId": "73daf989-7cc3-4caf-f966-ffafdd13d558"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Empty DataFrame\n",
"Columns: [user_id, song_id, play_count, title, release, artist_name, year]\n",
"Index: []\n"
]
}
],
"source": [
"# Make baseline clustering model\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.cluster import KMeans\n",
"\n",
"user_item_matrix = df.pivot(index='user_id', columns='song_id', values='play_count').fillna(0)\n",
"scaler = StandardScaler()\n",
"user_item_matrix_scaled = scaler.fit_transform(user_item_matrix)\n",
"k = 5\n",
"kmeans = KMeans(n_clusters=k, random_state=42)\n",
"\n",
"# Fit the model\n",
"user_clusters = kmeans.fit_predict(user_item_matrix_scaled)\n",
"\n",
"# Add cluster labels to the user_item_matrix\n",
"user_item_matrix['cluster'] = user_clusters\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dab1aaed"
},
"outputs": [],
"source": [
"def recommend_songs(user_id, user_item_matrix, top_n=10):\n",
" # Find the user's cluster\n",
" user_cluster = user_item_matrix.loc[user_id, 'cluster']\n",
"\n",
" # Filter the matrix to include only users from the same cluster\n",
" cluster_data = user_item_matrix[user_item_matrix['cluster'] == user_cluster]\n",
"\n",
" # Calculate the mean play_count for each song in the cluster\n",
" song_recommendations = cluster_data.drop('cluster', axis=1).mean().sort_values(ascending=False)\n",
"\n",
" # Get the top N song recommendations\n",
" return song_recommendations.head(top_n).index.tolist()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6uiG7e30X2UF",
"outputId": "2b58da98-70fd-4dbd-fbff-f463fea7f406"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Recommended Songs for User 19161: [317, 614, 7416, 6246, 352, 1664, 2220, 5531, 7913, 5645]\n"
]
}
],
"source": [
"user_id = df['user_id'].sample(n=1).iloc[0]\n",
"recommended_songs = recommend_songs(user_id, user_item_matrix, top_n=10)\n",
"print(f\"Recommended Songs for User {user_id}: {recommended_songs}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c2fd66f5"
},
"source": [
"#### Improving clustering-based recommendation system by tuning its hyper-parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "efe7d8e6",
"outputId": "6d7f3ccb-dce6-47d9-b09e-37e9c2666cdc"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Best parameters: {'n_clusters': 3, 'init': 'k-means++', 'n_init': 10}\n",
"Best silhouette score: 0.7560382053573051\n"
]
}
],
"source": [
"from sklearn.cluster import KMeans\n",
"from sklearn.metrics import silhouette_score\n",
"from sklearn.model_selection import KFold\n",
"\n",
"# Set the parameter space to tune\n",
"param_grid = {\n",
" 'n_clusters': [3, 5, 7, 10],\n",
" 'init': ['k-means++', 'random'],\n",
" 'n_init': [10, 15, 20]\n",
"}\n",
"\n",
"# Iterate over parameter combinations\n",
"for n_clusters in param_grid['n_clusters']:\n",
" for init in param_grid['init']:\n",
" for n_init in param_grid['n_init']:\n",
" # Configure KMeans with the current set of parameters\n",
" kmeans = KMeans(n_clusters=n_clusters, init=init, n_init=n_init, random_state=42)\n",
"\n",
" # Fit the model\n",
" user_clusters = kmeans.fit_predict(user_item_matrix_scaled)\n",
"\n",
" # Evaluate the model using silhouette score\n",
" silhouette_avg = silhouette_score(user_item_matrix_scaled, user_clusters)\n",
"\n",
" # Update best parameters and score if current score is better\n",
" if silhouette_avg > best_score:\n",
" best_score = silhouette_avg\n",
" best_params = {'n_clusters': n_clusters, 'init': init, 'n_init': n_init}\n",
"\n",
"print(f'Best parameters: {best_params}')\n",
"print(f'Best silhouette score: {best_score}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5a7a8a30",
"outputId": "993786e4-8c20-497c-8b98-77ae6d290990"
},
"outputs": [
{
"data": {
"text/plain": [
"<surprise.prediction_algorithms.co_clustering.CoClustering at 0x7f85de0af1c0>"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from surprise import CoClustering, Dataset, Reader\n",
"from surprise.model_selection import cross_validate\n",
"\n",
"# Train the tuned Coclustering algorithm\n",
"coclustering = CoClustering(n_cltr_u=3, n_cltr_i=3, n_epochs=20)\n",
"reader = Reader(rating_scale=(0, 5))\n",
"data = Dataset.load_from_df(df[['user_id', 'song_id', 'play_count']], reader)\n",
"trainset, testset = train_test_split(data, test_size=0.25, random_state=42)\n",
"coclustering.fit(trainset)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6ba5b26b",
"outputId": "479e8628-a86c-429d-b248-360620260bb5"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Predicted rating for user 11799 and item 8094: 4.33994942815162\n"
]
}
],
"source": [
"# Using co_clustering_optimized model to recommend for the random user and any song\n",
"user_id = df['user_id'].sample(n=1).iloc[0]\n",
"song_id = df['song_id'].sample(n=1).iloc[0]\n",
"\n",
"predicted_rating = coclustering.predict(user_id, song_id).est\n",
"print(f'Predicted rating for user {user_id} and item {song_id}: {predicted_rating}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5U56oSNsR-F2"
},
"source": [
"### Content Based Recommendation Systems"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UX826CsjR-F3"
},
"outputs": [],
"source": [
"# Concatenate the \"title\", \"release\", \"artist_name\" columns to create a different column named \"text\"\n",
"df['text'] = df.apply(lambda row: ' '.join(row[['title', 'release', 'artist_name']]), axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 247
},
"id": "WdXw4U-wR-F4",
"outputId": "85b3bf80-946f-4bbe-d86a-fd7b11bd5d78"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" <div id=\"df-b0785a5e-abf9-4c31-a467-2062819831b0\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>song_id</th>\n",
" <th>play_count</th>\n",
" <th>release</th>\n",
" <th>artist_name</th>\n",
" <th>year</th>\n",
" <th>text</th>\n",
" </tr>\n",
" <tr>\n",
" <th>title</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Rorol</th>\n",
" <td>27018</td>\n",
" <td>198</td>\n",
" <td>20</td>\n",
" <td>Identification Parade</td>\n",
" <td>Octopus Project</td>\n",
" <td>2002</td>\n",
" <td>Rorol Identification Parade Octopus Project</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Auto-Dub</th>\n",
" <td>27018</td>\n",
" <td>248</td>\n",
" <td>7</td>\n",
" <td>Skream!</td>\n",
" <td>Skream</td>\n",
" <td>2006</td>\n",
" <td>Auto-Dub Skream! Skream</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Hilarious Movie Of The 90s</th>\n",
" <td>27018</td>\n",
" <td>318</td>\n",
" <td>11</td>\n",
" <td>Pause</td>\n",
" <td>Four Tet</td>\n",
" <td>2001</td>\n",
" <td>Hilarious Movie Of The 90s Pause Four Tet</td>\n",
" </tr>\n",
" <tr>\n",
" <th>One Minute To Midnight</th>\n",
" <td>27018</td>\n",
" <td>926</td>\n",
" <td>11</td>\n",
" <td>Justice</td>\n",
" <td>Justice</td>\n",
" <td>0</td>\n",
" <td>One Minute To Midnight Justice Justice</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Lights &amp; Music</th>\n",
" <td>27018</td>\n",
" <td>1262</td>\n",
" <td>6</td>\n",
" <td>Lights &amp; Music</td>\n",
" <td>Cut Copy</td>\n",
" <td>2008</td>\n",
" <td>Lights &amp; Music Lights &amp; Music Cut Copy</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b0785a5e-abf9-4c31-a467-2062819831b0')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-b0785a5e-abf9-4c31-a467-2062819831b0 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-b0785a5e-abf9-4c31-a467-2062819831b0');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-bca7af86-ca9f-4cf2-a6ac-840425c6678b\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-bca7af86-ca9f-4cf2-a6ac-840425c6678b')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-bca7af86-ca9f-4cf2-a6ac-840425c6678b button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"text/plain": [
" user_id song_id play_count \\\n",
"title \n",
"Rorol 27018 198 20 \n",
"Auto-Dub 27018 248 7 \n",
"Hilarious Movie Of The 90s 27018 318 11 \n",
"One Minute To Midnight 27018 926 11 \n",
"Lights & Music 27018 1262 6 \n",
"\n",
" release artist_name year \\\n",
"title \n",
"Rorol Identification Parade Octopus Project 2002 \n",
"Auto-Dub Skream! Skream 2006 \n",
"Hilarious Movie Of The 90s Pause Four Tet 2001 \n",
"One Minute To Midnight Justice Justice 0 \n",
"Lights & Music Lights & Music Cut Copy 2008 \n",
"\n",
" text \n",
"title \n",
"Rorol Rorol Identification Parade Octopus Project \n",
"Auto-Dub Auto-Dub Skream! Skream \n",
"Hilarious Movie Of The 90s Hilarious Movie Of The 90s Pause Four Tet \n",
"One Minute To Midnight One Minute To Midnight Justice Justice \n",
"Lights & Music Lights & Music Lights & Music Cut Copy "
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Select the columns 'user_id', 'song_id', 'play_count', 'title', 'text' from df data\n",
"selected_columns = df[['user_id', 'song_id', 'play_count', 'title', 'text']]\n",
"\n",
"# Drop the duplicates from the title column\n",
"df_small = df.drop_duplicates(subset='title')\n",
"\n",
"# Set the title column as the index\n",
"df_small = df_small.set_index('title')\n",
"\n",
"# See the first 5 records of the df_small dataset\n",
"df_small.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qDcYHwZTR-F5"
},
"outputs": [],
"source": [
"# Create the series of indices from the data\n",
"df_small_index = df_small.index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "9UINF3Nwvwfr",
"outputId": "e0ab4436-deba-43cf-bd7f-096235347cb8"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)\n",
"Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk) (8.1.7)\n",
"Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk) (1.3.2)\n",
"Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk) (2023.6.3)\n",
"Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk) (4.66.1)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package punkt to /root/nltk_data...\n",
"[nltk_data] Package punkt is already up-to-date!\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: wordnet in /usr/local/lib/python3.10/dist-packages (0.0.1b2)\n",
"Requirement already satisfied: colorama==0.3.9 in /usr/local/lib/python3.10/dist-packages (from wordnet) (0.3.9)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package wordnet to /root/nltk_data...\n",
"[nltk_data] Package wordnet is already up-to-date!\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: stopwords in /usr/local/lib/python3.10/dist-packages (1.0.0)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
"[nltk_data] Package stopwords is already up-to-date!\n"
]
}
],
"source": [
"# Importing necessary packages to work with text data\n",
"import nltk\n",
"\n",
"# Download punkt library\n",
"!pip install nltk\n",
"nltk.download('punkt')\n",
"!pip install wordnet\n",
"nltk.download('wordnet')\n",
"!pip install stopwords\n",
"nltk.download('stopwords')\n",
"\n",
"import re\n",
"from nltk.tokenize import word_tokenize\n",
"from nltk.stem import WordNetLemmatizer\n",
"from nltk.corpus import stopwords\n",
"from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer\n",
"from nltk.stem import PorterStemmer\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Jt2vitlnhoEg"
},
"source": [
"Create a **function to pre-process the text data:**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "j5QSSeUvR-F6",
"outputId": "7de7f96c-a077-424d-9f60-344f332743a1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" user_id song_id play_count \\\n",
"title \n",
"Rorol 27018 198 20 \n",
"Auto-Dub 27018 248 7 \n",
"Hilarious Movie Of The 90s 27018 318 11 \n",
"One Minute To Midnight 27018 926 11 \n",
"Lights & Music 27018 1262 6 \n",
"\n",
" release artist_name year \\\n",
"title \n",
"Rorol Identification Parade Octopus Project 2002 \n",
"Auto-Dub Skream! Skream 2006 \n",
"Hilarious Movie Of The 90s Pause Four Tet 2001 \n",
"One Minute To Midnight Justice Justice 0 \n",
"Lights & Music Lights & Music Cut Copy 2008 \n",
"\n",
" text \\\n",
"title \n",
"Rorol Rorol Identification Parade Octopus Project \n",
"Auto-Dub Auto-Dub Skream! Skream \n",
"Hilarious Movie Of The 90s Hilarious Movie Of The 90s Pause Four Tet \n",
"One Minute To Midnight One Minute To Midnight Justice Justice \n",
"Lights & Music Lights & Music Lights & Music Cut Copy \n",
"\n",
" processed_text \n",
"title \n",
"Rorol rorol identif parad octopu project \n",
"Auto-Dub auto-dub skream ! skream \n",
"Hilarious Movie Of The 90s hilari movi of the 90 paus four tet \n",
"One Minute To Midnight one minut to midnight justic justic \n",
"Lights & Music light & music light & music cut copi \n"
]
}
],
"source": [
"from nltk.sem.drt import Tokens\n",
"# Create a function to tokenize the text\n",
"def preprocess_text(text):\n",
" # Tokenize the text\n",
" tokens = word_tokenize(text)\n",
"\n",
" # Remove stopwords and stem the words\n",
" stop_words = set(stopwords.words('english'))\n",
" stemmer = PorterStemmer()\n",
"\n",
" processed_tokens = [stemmer.stem(word) for word in tokens if word not in stop_words]\n",
"\n",
" # Return the processed text\n",
" return ' '.join(processed_tokens)\n",
"\n",
"df_small['processed_text'] = df_small['text'].apply(preprocess_text)\n",
"\n",
"# See the first 5 records of the df_small dataset\n",
"print(df_small.head())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "RI_onIGdR-F6",
"outputId": "001ed1c8-8876-4d6e-9c28-ed71e7f2db5a"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature names: ['08' '09' '15' ... 'zapotec' 'zero' 'éxito']\n"
]
}
],
"source": [
"# Create tfidf vectorizer\n",
"tfidf_vectorizer = TfidfVectorizer(stop_words='english')\n",
"# Fit_transfrom the above vectorizer on the text column and then convert the output into an array\n",
"tfidf_matrix = tfidf_vectorizer.fit_transform(df_small['processed_text'])\n",
"\n",
"# Get the feature names\n",
"feature_names = tfidf_vectorizer.get_feature_names_out()\n",
"\n",
"# Convert the TF-IDF matrix to an array and print it\n",
"tfidf_array = tfidf_matrix.toarray()\n",
"\n",
"print(\"Feature names:\", feature_names)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Beak6ODRR-F7",
"outputId": "7261b94a-7254-4997-ab52-2994aefb3620"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1. 0. 0. ... 0. 0. 0. ]\n",
" [0. 1. 0. ... 0. 0. 0. ]\n",
" [0. 0. 1. ... 0. 0. 0. ]\n",
" ...\n",
" [0. 0. 0. ... 1. 0. 0. ]\n",
" [0. 0. 0. ... 0. 1. 0.46289441]\n",
" [0. 0. 0. ... 0. 0.46289441 1. ]]\n"
]
}
],
"source": [
"# Compute the cosine similarity for the tfidf above output\n",
"cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)\n",
"print(cosine_sim)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3Jjo3UHKhoEh"
},
"source": [
" Finally, Create a function to find most similar songs to recommend for a given song."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "upANOISkR-F8"
},
"outputs": [],
"source": [
"# Function that takes in song title as input and returns the top 10 recommended songs\n",
"def get_similar_songs(title, cosine_sim, df, top_n=10):\n",
" \"\"\"\n",
" Get the most similar songs for a given song title based on cosine similarity.\n",
"\n",
" Args:\n",
" title (str): The title of the song.\n",
" cosine_sim (numpy.ndarray): The cosine similarity matrix.\n",
" df (pd.DataFrame): DataFrame containing the song titles.\n",
" top_n (int): Number of top similar songs to return.\n",
"\n",
" Returns:\n",
" list: A list of top_n similar song titles.\n",
" \"\"\"\n",
" # Check if the song is in the DataFrame\n",
" if title not in df.index:\n",
" raise ValueError(f\"Song '{title}' not found in the dataset.\")\n",
"\n",
" # Get the index of the song\n",
" idx = df.index.get_loc(title)\n",
"\n",
" # Get the cosine similarity scores for this song with all songs\n",
" sim_scores = list(enumerate(cosine_sim[idx]))\n",
"\n",
" # Sort the songs based on the similarity scores\n",
" sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)\n",
"\n",
" # Get the scores of the top_n most similar songs (excluding itself)\n",
" sim_scores = sim_scores[1:top_n+1]\n",
"\n",
" # Get the song indices\n",
" song_indices = [i[0] for i in sim_scores]\n",
"\n",
" # Return the top_n most similar songs\n",
" return df.iloc[song_indices].index.tolist()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "o4EINBmkR-F8"
},
"source": [
"Recommending 10 songs similar to Learn to Fly"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ohEK5dkVR-F8",
"outputId": "a18cff82-cf9b-448a-c08b-03bba49b49a9"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Songs similar to 'Feel The Love':\n",
"Out There On The Ice\n",
"Far Away\n",
"Strangers In The Wind\n",
"Hearts On Fire\n",
"Lights & Music\n"
]
}
],
"source": [
"# Make the recommendation for the song with title 'Feel The Love'\n",
"song_title = 'Feel The Love'\n",
"similar_songs = get_similar_songs(song_title, cosine_sim, df_small, top_n=5)\n",
"\n",
"print(f\"Songs similar to '{song_title}':\")\n",
"for song in similar_songs:\n",
" print(song)"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [
"BVUiyhYTHS1t",
"bUGKX140wf-S",
"12TKB2M7XyC6",
"bvKb5FHcXzcN",
"uZcr1Eke2T9W",
"Ituk9wA4Idib",
"gf13HrPPJeWT",
"QgbzJKk7Tsnr",
"57b31de5",
"c2fd66f5",
"5U56oSNsR-F2"
],
"machine_shape": "hm",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment