Skip to content

Instantly share code, notes, and snippets.

@oscar-defelice
Created May 8, 2020 13:09
Show Gist options
  • Save oscar-defelice/18ad74b40e5dfd4c61297c2bf3464204 to your computer and use it in GitHub Desktop.
Save oscar-defelice/18ad74b40e5dfd4c61297c2bf3464204 to your computer and use it in GitHub Desktop.
def get_posneg(df, anchor):
"""
Given a user id anchor, it gives back the max number of triplets [anchor, positive, negative]
available.
Triplets are randomly shuffled to better feed the training network.
Parameters
----------
df : Pandas DataFrame
Dataframe containing ratings, having user id as rows, movie id as columns
anchor : int
user id to be serving as anchor.
Return
------
pn_list : list of int
list of P,N elements
values : Positive (Movie id evaluated at least 4.0),
Negative (Movie id evaluates at most 3.0)
"""
POS_THR = 4.0
NEG_THR = 3.0
ps = df.loc[anchor] >= POS_THR
pos_ids = ps[ps].index.values
ns = df.loc[anchor] <= NEG_THR
neg_ids = ns[ns].index.values
n, m = len(pos_ids), len(neg_ids)
pn_list = []
for i in range(n):
for j in range(m):
pn_elem = [pos_ids[i], neg_ids[j]]
pn_list.append(pn_elem)
import random
random.shuffle(pn_list)
return pn_list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment