Skip to content

Instantly share code, notes, and snippets.

@amankharwal
Created December 26, 2020 13:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amankharwal/ca8827e5861d3206738c1365eb0b603d to your computer and use it in GitHub Desktop.
Save amankharwal/ca8827e5861d3206738c1365eb0b603d to your computer and use it in GitHub Desktop.
df_nan = pd.DataFrame(pd.isnull(df.Rating))
df_nan = df_nan[df_nan['Rating'] == True]
df_nan = df_nan.reset_index()
movie_np = []
movie_id = 1
for i,j in zip(df_nan['index'][1:],df_nan['index'][:-1]):
# numpy approach
temp = np.full((1,i-j-1), movie_id)
movie_np = np.append(movie_np, temp)
movie_id += 1
# Account for last record and corresponding length
# numpy approach
last_record = np.full((1,len(df) - df_nan.iloc[-1, 0] - 1),movie_id)
movie_np = np.append(movie_np, last_record)
print('Movie numpy: {}'.format(movie_np))
print('Length: {}'.format(len(movie_np)))
# remove those Movie ID rows
df = df[pd.notnull(df['Rating'])]
df['Movie_Id'] = movie_np.astype(int)
df['Cust_Id'] = df['Cust_Id'].astype(int)
print('-Dataset examples-')
print(df.iloc[::5000000, :])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment