This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import numpy as np | |
import re | |
import nltk | |
from sklearn.feature_extraction.text import TfidfVectorizer | |
from sklearn.metrics.pairwise import cosine_similarity | |
nltk.download('stopwords') | |
nltk.download('punkt') | |
from nltk import word_tokenize | |
from nltk.corpus import stopwords |
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 4.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description | |
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable." | |
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth." | |
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def clean_desc(s): | |
s = str(s) | |
s = s.lower() | |
s = re.sub(r'[^a-zA-Z]', ' ', s) | |
return s | |
# make a copy of the main data and do the preprocessing steps on that data | |
netflix_data_copy['clean_desc'] = netflix_data_copy['description'].apply(cleaning) | |
#tokenizing the words for lemmatization and removing stopwords |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# making an object of TfidfVectorizer in which words contains only in 1 document and word repeated in 70% of documents are ignored. | |
tfidf = TfidfVectorizer(min_df = 2, max_df = 0.7) | |
# fitting the cleaned text in TfidfVectorizer | |
X = tfidf.fit_transform(netflix_data_copy['clean_desc']) | |
# making a suitable dataframe for calculating the cosine similarity and save it |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def recommend_table(list_of_movie_enjoyed, tfidf_data, movie_count=20): | |
""" | |
function for recommending movies | |
:param list_of_movie_enjoyed: list of movies | |
:param tfidf_data: self-explanatory | |
:param movie_count: no of movies to suggest | |
:return: dataframe containing suggested movie | |
""" | |
movie_enjoyed_df = tfidf_data.reindex(list_of_movie_enjoyed) | |
user_prof = movie_enjoyed_df.mean() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
main streamlit app | |
""" | |
import pickle | |
import pandas as pd | |
import streamlit as st | |
from streamlit import session_state as session | |
from src.recommend.recommend import recommend_table | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pickle | |
import pandas as pd | |
import streamlit as st | |
from streamlit import session_state as session | |
from src.recommend.recommend import recommend_table |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@st.cache(persist=True, show_spinner=False, suppress_st_warning=True) | |
def load_data(): | |
""" | |
load and cache data | |
:return: tfidf data | |
""" | |
tfidf_data = pd.read_csv("data/tfidf_data.csv", index_col=0) | |
return tfidf_data | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dataframe = None | |
st.title(""" | |
Netflix Recommendation System | |
This is an Content Based Recommender System made on implicit ratings :smile:. | |
""") | |
st.text("") | |
st.text("") | |
st.text("") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<html> | |
<body> | |
<div> | |
<p>Hello World!</p> | |
<div> | |
<p>Choose me!</p> | |
</div> | |
</div> | |
<div> | |
<p>Why are you not looking at me?</p> |
OlderNewer