Skip to content

Instantly share code, notes, and snippets.

@AayushSameerShah
Created May 17, 2021 15:44
Show Gist options
  • Save AayushSameerShah/58e09fd89833f467dc462ba0807bf733 to your computer and use it in GitHub Desktop.
Save AayushSameerShah/58e09fd89833f467dc462ba0807bf733 to your computer and use it in GitHub Desktop.
This is the process when you find yourself in a situation when there are overlapping categories per row and still one to categorize by single category... This will help to do just that and is so simple.
# Create a unique set, so it becomes clear
genres = set()
for gen in df.Genre:
for single_gen in map(str.strip, gen.split(",")):
genres.add(single_gen)
# Create dict to store ids of that category
genre_ids = dict()
for gen in genres:
genre_ids[gen] = []
# Then iterate over 'mixed' category column and save id there
for movie in df.iterrows():
for gen in genre_ids.keys():
if gen in movie[1]['Genre']:
genre_ids[gen].append(movie[0])
# -- That's all! Now on top of that we can build more --
@AayushSameerShah
Copy link
Author

There is one more version for the same (from the Pandas Notebook) check it out here in this gist:
https://gist.github.com/AayushSameerShah/23b145fe9e3f56acc28fbdaf51186902

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment