Skip to content

Instantly share code, notes, and snippets.

@katipogluMustafa
Last active November 15, 2020 22:10
Show Gist options
  • Save katipogluMustafa/2e98d8776e1cd1002cb7186fba6ec0e2 to your computer and use it in GitHub Desktop.
Save katipogluMustafa/2e98d8776e1cd1002cb7186fba6ec0e2 to your computer and use it in GitHub Desktop.
Loads Netflix Prize Dataset Movies
class NetflixDataset(Dataset):
def load_movies(movies_path, movies_col_names=('item_id', 'year', 'title')):
movies = pd.read_csv(movies_path, encoding='ISO-8859-1', header=None, names=movies_col_names).set_index('item_id')
movies['year'].replace([np.inf, -np.inf, np.nan], 0, inplace=True)
movies['year'] = movies['year'].astype(int)
movies = movies.reindex(columns=['title', 'year'])
# From the netflix prize dataset, I will only be using the first part which contains 4449 unique movies.
# That is why I will be truncating the other movies but if you load all the netflix data, remove this line.
movies = movies[:4499] # Keep only the first 4499 movies of the dataset
return movies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment