Created
May 17, 2021 15:44
-
-
Save AayushSameerShah/58e09fd89833f467dc462ba0807bf733 to your computer and use it in GitHub Desktop.
This is the process when you find yourself in a situation when there are overlapping categories per row and still one to categorize by single category... This will help to do just that and is so simple.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Create a unique set, so it becomes clear | |
genres = set() | |
for gen in df.Genre: | |
for single_gen in map(str.strip, gen.split(",")): | |
genres.add(single_gen) | |
# Create dict to store ids of that category | |
genre_ids = dict() | |
for gen in genres: | |
genre_ids[gen] = [] | |
# Then iterate over 'mixed' category column and save id there | |
for movie in df.iterrows(): | |
for gen in genre_ids.keys(): | |
if gen in movie[1]['Genre']: | |
genre_ids[gen].append(movie[0]) | |
# -- That's all! Now on top of that we can build more -- |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There is one more version for the same (from the Pandas Notebook) check it out here in this gist:
https://gist.github.com/AayushSameerShah/23b145fe9e3f56acc28fbdaf51186902