Skip to content

Instantly share code, notes, and snippets.

@BalazsHoranyi
Last active May 31, 2018 15:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save BalazsHoranyi/790a03440abffc2b8fa9a65ee7c43960 to your computer and use it in GitHub Desktop.
Save BalazsHoranyi/790a03440abffc2b8fa9a65ee7c43960 to your computer and use it in GitHub Desktop.
print('users')
users = da.from_npy_stack('users', mmap_mode=None).compute().astype(np.int32)
print('items')
items = da.from_npy_stack('items', mmap_mode=None).compute().astype(np.int32)
print('getting unique')
unique_items, item_inverse, item_count = np.unique(items, return_counts=True, return_inverse=True)
print('creating mask')
good_items = unique_items[np.where(item_count > 50)[0]]
mask = np.isin(items, good_items)
users = users[mask]
items = items[mask]
item_count = item_count[np.where(item_count>50)[0]]
# Normalize users and items to start at id:0
user_id_map_norm = {v:i for i,v in enumerate(set(users))}
item_id_map_norm = {v:i for i,v in enumerate(set(items))}
users = np.array([user_id_map_norm[x] for x in users])
items = np.array([item_id_map_norm[x] for x in items])
users = users.astype(np.int32)
items = items.astype(np.int32)
print(f'we now have {len(items)} interactions')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment