Skip to content

Instantly share code, notes, and snippets.

@mjdietzx
Created January 28, 2018 23:21
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save mjdietzx/4d74bcb11f36e624b829784a2a2e7dd0 to your computer and use it in GitHub Desktop.
code to help debug this AWS Rekognition issue: https://forums.aws.amazon.com/thread.jspa?threadID=271815&tstart=0
import collections
import json
def duplicates():
#
# organize raw Rekognition `boto3.client('rekognition').get_face_search()` response for debugging this issue
#
with open('duplicated_index_bug.json', 'r') as f: # https://s3.us-east-2.amazonaws.com/brayniac-waya-ai/duplicated_index_bug.json
# list of all `PersonMatch` objects returned by Rekognition
person_match_objects = json.loads(f.read())
# `indices` maps `Index` of `Person` tracked throughout video, to a set of their `PersonMatch` objects
indices = collections.defaultdict(set)
for person_match_object in person_match_objects:
index = person_match_object['Person'].pop('Index') # `Index` popped off b/c this is the only thing that differs
if person_match_object['Person'].get('Face'):
indices[index].add(json.dumps(person_match_object)) # use sets instead of lists of `PersonMatch` objects
# `merged` maps `Index` to the other indices it overlaps with
merged = collections.defaultdict(set)
#
# just see how much overlap we actually have b/w indexes
#
for i, (k, v) in enumerate(indices.items()): # `i`, `j` used to avoid iterating over same combo multiple times..
for j, (_k, _v) in enumerate(indices.items()):
if i >= j or k in [i for v in merged.values() for i in v]:
continue
if len(v & _v): # if there is any overlap at all b/w `Index` print this info and add to `merged`
print(k, _k, len(v), len(_v), len(v & _v), len(v & _v) / len(v))
merged[k].add(_k)
print(merged)
@mjdietzx
Copy link
Author

mjdietzx commented Jan 28, 2018

running this code outputs (download json file required here):

# index1, index2, index1_nb_frames, index2_nb_frames, duplicated frames, % of index frames which are duplicates
# `print(k, _k, len(v), len(_v), len(v & _v), len(v & _v) / len(v)))

21 22 1292 1216 1188 0.9195046439628483
21 23 1292 1204 1188 0.9195046439628483
21 240 1292 583 582 0.4504643962848297
28 29 509 423 422 0.8290766208251473
60 120 507 547 484 0.9546351084812623
140 142 416 547 415 0.9975961538461539
140 143 416 471 415 0.9975961538461539
140 144 416 487 415 0.9975961538461539
140 146 416 446 415 0.9975961538461539

# `merged`
defaultdict(<class 'set'>, {21: {240, 22, 23}, 28: {29}, 60: {120}, 140: {144, 146, 142, 143}})

and this describes each of those person Index in the video we are analyzing with boto3.get_face_search

# indexes and who they follow over the course of the video...

# 21 - aussie mac, and then frank villain (both mac and frank when in same scene)
# 22 - aussie mac, and then frank villain (both mac and frank when in same scene)
# 23 - aussie mac, and then frank villain (both mac and frank when in same scene)
# 240 - frank villain entire time (doesn't include mac at all)

# 28 - riggs dennis
# 29 - riggs dennis

# 60 - shortly all over the place w/ flashing scenes, charlie sunglasses, mac riggs
# 120 - shortly all over the place w/ flashing scenes, charlie sunglasses, mac riggs

# 140 - aussie dennis (a-little long distance aussie mac through window of car)
# 144 - aussie dennis (a-little long distance aussie mac through window of car)
# 146 - aussie dennis (a-little long distance aussie mac through window of car)
# 143 - aussie dennis (a-little long distance aussie mac through window of car)

this video shows the problem described here and tracks indices 21, 22, 23, 240 throughout the original video analyzed by Rekognition. The interesting thing is that 240 actually gets it correctly (only tracking frank villain) and could possibly be used to workaround the issue linked above ^.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment