Skip to content

Instantly share code, notes, and snippets.

@victorkohler
Created August 13, 2017 17:29
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save victorkohler/e6bcf7a3ada3a9110f133344bd54995f to your computer and use it in GitHub Desktop.
Save victorkohler/e6bcf7a3ada3a9110f133344bd54995f to your computer and use it in GitHub Desktop.
import random
import pandas as pd
import numpy as np
import scipy.sparse as sparse
from scipy.sparse.linalg import spsolve
from sklearn.preprocessing import MinMaxScaler
#-------------------------
# LOAD AND PREP THE DATA
#-------------------------
raw_data = pd.read_table('data/usersha1-artmbid-artname-plays.tsv')
raw_data = raw_data.drop(raw_data.columns[1], axis=1)
raw_data.columns = ['user', 'artist', 'plays']
# Drop rows with missing values
data = raw_data.dropna()
# Convert artists names into numerical IDs
data['user_id'] = data['user'].astype("category").cat.codes
data['artist_id'] = data['artist'].astype("category").cat.codes
# Create a lookup frame so we can get the artist names back in
# readable form later.
item_lookup = data[['artist_id', 'artist']].drop_duplicates()
item_lookup['artist_id'] = item_lookup.artist_id.astype(str)
data = data.drop(['user', 'artist'], axis=1)
# Drop any rows that have 0 plays
data = data.loc[data.plays != 0]
# Create lists of all users, artists and plays
users = list(np.sort(data.user_id.unique()))
artists = list(np.sort(data.artist_id.unique()))
plays = list(data.plays)
# Get the rows and columns for our new matrix
rows = data.user_id.astype(int)
cols = data.artist_id.astype(int)
# Contruct a sparse matrix for our users and items containing number of plays
data_sparse = sparse.csr_matrix((plays, (rows, cols)), shape=(len(users), len(artists)))
@greeshmasmenon
Copy link

By having np.sort(data.user_id.unique())) and np.sort(data.artist_id.unique()), how are you ensuring that plays is mapped to the write user and artist?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment