evancloutier/kinect-snippet.md

## kinect-snippet.md

      
    Raw
  

              kinect-snippet.md
            
          
    Project Saving Snippet

Background

When we were in the final stages of the virtual piano project, one of the last steps was to map the user's finger points from the color frame to the depth frame. This would allow us to determine whether or not a key was actually being pressed (which is pretty important if you want a piano to actually play notes), rather than just being hovered over. Let's say your hand looked like this, where the green circles denote the centres of your fingertips:

To determine if a particular key was being pressed, we needed to calculate the difference between the current depth at that finger point and the original depth at that point when there was nothing in that frame, which was stored in a 2D numpy array. If the difference was lower than a specified threshold, you're playing music.
Problem

The Kinect v2 has a 1080p color camera (1920 x 1080), and a LiDar sensor that produces stream of depth data (512 x 424). To translate a point between these two frames, you would take the pixel location of the point on one frame and scale it accordingly to be sized onto the other. This is what we were originally doing, but the translated points were off by a constant amount on the horizontal axis and we couldn't figure out why. We retraced our steps and thought about where we could be going wrong, and then took a hard second glance at the physical device:

Normally, the method we were using would work, if you had two cameras that were in the exact same position. This was not our case. As you can see in the picture, the depth sensor and RGB camera are in physically different locations, meaning that the data that they produce will be ever so slightly different on the horizontal axis. If we lined up a penny so that it would be on the rightmost edge of the color frame, it would be more centred (to the left) in the depth frame. This was the concept that we used to figure out how to properly map the points in between the two frames. We knew that all of the depth frame data would be contained within the boundaries of the color frame, we just needed to determine what those bounds were.
Solution

To provide some more context, we were cropping the color frames we retrieved from the Kinect to reduce our processing times. We had an initialization component that detected the largest 'blob' within the camera's field of vision, found the bounds of said blob, and used those bounds to slice the newly retrieved frames. So, a frame that originally looked like this:

Would turn into this:

To calculate the finger point's depth coordinates, we manually calibrated where the depth frame's offset was within the color frame. Using those values, and the rest of the information available in the frame data, we came up with an equation to translate points between the bounded color frame and the depth frame.

The depth x-coordinate can be calculated as:
d_x = (f_x + d) * w_x / ( c_x - x_m1 - x_m2)
where,

f_x is the bounded offset value
d is the distance between the leftmost edge of the depth frame and the leftmost edge of the bounded color frame
w_x is the width of the depth frame
c_x is the width of the unbounded color frame
x_m1 is the distance between the leftmost edge of the depth frame and the leftmost edge of the unbounded color frame
x_m2 is the distance between the rightmost edge of the depth frame and the rightmost edge of the unbounded color frame

This equation considers the offset when scaling the finger point inbetween frames, eliminating the horizontal offset that exists between the camera and the LiDar sensor.
Code

def convertColorFingerPoint(self, fingerPoint, depthFrame):

        # Bounded color frame coordinates
        bNearXOffset, bNearYOffset, bFarXOffset, bFarYOffset = self.kinect.keyBounds

        # Get shape of depth frame
        dY, dX = depthFrame.shape

        cY, cX, _ = self.kinect.originalColorFrame.shape

        # Manually calibrated difference between depth range and color range
        xM1 = 278
        xM2 = 1795

        # Obtain the scaling factor for y-coordinate transformation
        yScalingFactor = float(dY) / cY
        
        if fingerPoint is not None:
            fX = fingerPoint[0]
            fY = fingerPoint[1]
            fX = fX + (bNearXOffset - xM1)
            depthPointX = (float(fX) * dX) / (cX - xM1 - (cX - xM2) )
            depthPointY = (fY + bNearYOffset) * yScalingFactor


        return int(depthPointX), int(depthPointY)