Gist for Move Mirror blog post: weighted distance
// poseVector1 and poseVector2 are 52-float vectors composed of: | |
// Values 0-33: are x,y coordinates for 17 body parts in alphabetical order | |
// Values 34-51: are confidence values for each of the 17 body parts in alphabetical order | |
// Value 51: A sum of all the confidence values | |
// Again the lower the number, the closer the distance | |
function weightedDistanceMatching(poseVector1, poseVector2) { | |
let vector1PoseXY = poseVector1.slice(0, 34); | |
let vector1Confidences = poseVector1.slice(34, 51); | |
let vector1ConfidenceSum = poseVector1.slice(51, 52); | |
let vector2PoseXY = poseVector2.slice(0, 34); | |
// First summation | |
let summation1 = 1 / vector1ConfidenceSum; | |
// Second summation | |
let summation2 = 0; | |
for (let i = 0; i < vector1PoseXY.length; i++) { | |
let tempConf = Math.floor(i / 2); | |
let tempSum = vector1Confidences[tempConf] * Math.abs(vector1PoseXY[i] - vector2PoseXY[i]); | |
summation2 = summation2 + tempSum; | |
} | |
return summation1 * summation2; | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This comment has been minimized.
I am following along with https://medium.com/tensorflow/move-mirror-an-ai-experiment-with-pose-estimation-in-the-browser-using-tensorflow-js-2f7b769f9b23 . I've not done any computer vision projects since university, so I'm going to ask a bunch of questions and see where it gets me.
I noticed that you're not using the confidence scores from poseVector2:
a) Does it matter that the distance is not symmetrical (
weightedDistanceMatching(p1, p2) != weightedDistanceMatching(p2, p1)
)?b) Are you legitimately cheating by assuming that confidence == 1 here, because your corpus of images is mostly clean, or is there something else going on?
c) If I have a corpus of images where confidence << 1 for a lot of images, should I try to formulate a weighted distance function that incorporates the confidence of both poseVectors?
Also, do you have the source code for the server part so that I can see what it's doing in practice? It looks like the search part all happens on a server, so I can't just drop into a debugger and read it.