Skip to content

Instantly share code, notes, and snippets.

@TheJLifeX
Last active April 18, 2024 21:53
Show Gist options
  • Save TheJLifeX/74958cc59db477a91837244ff598ef4a to your computer and use it in GitHub Desktop.
Save TheJLifeX/74958cc59db477a91837244ff598ef4a to your computer and use it in GitHub Desktop.
Simple Hand Gesture Recognition Code - Hand tracking - Mediapipe

Simple Hand Gesture Recognition Code - Hand tracking - Mediapipe

Goal of this gist is to recognize ONE, TWO, TREE, FOUR, FIVE, SIX, YEAH, ROCK, SPIDERMAN and OK. We use the LANDMARKS output of the LandmarkLetterboxRemovalCalculator. This output is a landmark list that contains 21 landmark. In the 02-landmarks.jpg picture below you can see the index of each landmark. Each landmark have x, y and z values. But only x, y values are sufficient for our Goal. If you dont want to copy/paste each the code on this gist, you can clone my forked version of mediapipe here: https://github.com/TheJLifeX/mediapipe. I have already commited all code in that repository.

We have five finger states.

  1. thumbIsOpen
  2. firstFingerIsOpen
  3. secondFingerIsOpen
  4. thirdFingerIsOpen
  5. fourthFingerIsOpen

For exmaple: thumb is open if the x value of landmark 3 and the x value of landmark 4 are less than x value of landmark 2 else it is close

PS: thumb open/close works only for the right hand. Because we can not yet determine if you show your left or right hand. For more info see this issue: Can palm_detection distinguish between right and left hand?

Prerequisite: You kwon how to run the hand tracking example.

  1. Get Started with mediapipe
  2. Hand Tracking on Desktop

If you want to know how to recognize some simple hand mouvements like Scrolling, Zoom in/out and Slide left/right (see comment below) you can read this gist: Simple Hand Mouvement Recognition Code.

#include <cmath>
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "mediapipe/framework/formats/rect.pb.h"
namespace mediapipe
{
namespace
{
constexpr char normRectTag[] = "NORM_RECT";
constexpr char normalizedLandmarkListTag[] = "NORM_LANDMARKS";
} // namespace
// Graph config:
//
// node {
// calculator: "HandGestureRecognitionCalculator"
// input_stream: "NORM_LANDMARKS:scaled_landmarks"
// input_stream: "NORM_RECT:hand_rect_for_next_frame"
// }
class HandGestureRecognitionCalculator : public CalculatorBase
{
public:
static ::mediapipe::Status GetContract(CalculatorContract *cc);
::mediapipe::Status Open(CalculatorContext *cc) override;
::mediapipe::Status Process(CalculatorContext *cc) override;
private:
float get_Euclidean_DistanceAB(float a_x, float a_y, float b_x, float b_y)
{
float dist = std::pow(a_x - b_x, 2) + pow(a_y - b_y, 2);
return std::sqrt(dist);
}
bool isThumbNearFirstFinger(NormalizedLandmark point1, NormalizedLandmark point2)
{
float distance = this->get_Euclidean_DistanceAB(point1.x(), point1.y(), point2.x(), point2.y());
return distance < 0.1;
}
};
REGISTER_CALCULATOR(HandGestureRecognitionCalculator);
::mediapipe::Status HandGestureRecognitionCalculator::GetContract(
CalculatorContract *cc)
{
RET_CHECK(cc->Inputs().HasTag(normalizedLandmarkListTag));
cc->Inputs().Tag(normalizedLandmarkListTag).Set<mediapipe::NormalizedLandmarkList>();
RET_CHECK(cc->Inputs().HasTag(normRectTag));
cc->Inputs().Tag(normRectTag).Set<NormalizedRect>();
return ::mediapipe::OkStatus();
}
::mediapipe::Status HandGestureRecognitionCalculator::Open(
CalculatorContext *cc)
{
cc->SetOffset(TimestampDiff(0));
return ::mediapipe::OkStatus();
}
::mediapipe::Status HandGestureRecognitionCalculator::Process(
CalculatorContext *cc)
{
// hand closed (red) rectangle
const auto rect = &(cc->Inputs().Tag(normRectTag).Get<NormalizedRect>());
float width = rect->width();
float height = rect->height();
if (width < 0.01 || height < 0.01)
{
LOG(INFO) << "No Hand Detected";
return ::mediapipe::OkStatus();
}
const auto &landmarkList = cc->Inputs()
.Tag(normalizedLandmarkListTag)
.Get<mediapipe::NormalizedLandmarkList>();
RET_CHECK_GT(landmarkList.landmark_size(), 0) << "Input landmark vector is empty.";
// finger states
bool thumbIsOpen = false;
bool firstFingerIsOpen = false;
bool secondFingerIsOpen = false;
bool thirdFingerIsOpen = false;
bool fourthFingerIsOpen = false;
//
float pseudoFixKeyPoint = landmarkList.landmark(2).x();
if (landmarkList.landmark(3).x() < pseudoFixKeyPoint && landmarkList.landmark(4).x() < pseudoFixKeyPoint)
{
thumbIsOpen = true;
}
pseudoFixKeyPoint = landmarkList.landmark(6).y();
if (landmarkList.landmark(7).y() < pseudoFixKeyPoint && landmarkList.landmark(8).y() < pseudoFixKeyPoint)
{
firstFingerIsOpen = true;
}
pseudoFixKeyPoint = landmarkList.landmark(10).y();
if (landmarkList.landmark(11).y() < pseudoFixKeyPoint && landmarkList.landmark(12).y() < pseudoFixKeyPoint)
{
secondFingerIsOpen = true;
}
pseudoFixKeyPoint = landmarkList.landmark(14).y();
if (landmarkList.landmark(15).y() < pseudoFixKeyPoint && landmarkList.landmark(16).y() < pseudoFixKeyPoint)
{
thirdFingerIsOpen = true;
}
pseudoFixKeyPoint = landmarkList.landmark(18).y();
if (landmarkList.landmark(19).y() < pseudoFixKeyPoint && landmarkList.landmark(20).y() < pseudoFixKeyPoint)
{
fourthFingerIsOpen = true;
}
// Hand gesture recognition
if (thumbIsOpen && firstFingerIsOpen && secondFingerIsOpen && thirdFingerIsOpen && fourthFingerIsOpen)
{
LOG(INFO) << "FIVE!";
}
else if (!thumbIsOpen && firstFingerIsOpen && secondFingerIsOpen && thirdFingerIsOpen && fourthFingerIsOpen)
{
LOG(INFO) << "FOUR!";
}
else if (thumbIsOpen && firstFingerIsOpen && secondFingerIsOpen && !thirdFingerIsOpen && !fourthFingerIsOpen)
{
LOG(INFO) << "TREE!";
}
else if (thumbIsOpen && firstFingerIsOpen && !secondFingerIsOpen && !thirdFingerIsOpen && !fourthFingerIsOpen)
{
LOG(INFO) << "TWO!";
}
else if (!thumbIsOpen && firstFingerIsOpen && !secondFingerIsOpen && !thirdFingerIsOpen && !fourthFingerIsOpen)
{
LOG(INFO) << "ONE!";
}
else if (!thumbIsOpen && firstFingerIsOpen && secondFingerIsOpen && !thirdFingerIsOpen && !fourthFingerIsOpen)
{
LOG(INFO) << "YEAH!";
}
else if (!thumbIsOpen && firstFingerIsOpen && !secondFingerIsOpen && !thirdFingerIsOpen && fourthFingerIsOpen)
{
LOG(INFO) << "ROCK!";
}
else if (thumbIsOpen && firstFingerIsOpen && !secondFingerIsOpen && !thirdFingerIsOpen && fourthFingerIsOpen)
{
LOG(INFO) << "SPIDERMAN!";
}
else if (!thumbIsOpen && !firstFingerIsOpen && !secondFingerIsOpen && !thirdFingerIsOpen && !fourthFingerIsOpen)
{
LOG(INFO) << "FIST!";
}
else if (!firstFingerIsOpen && secondFingerIsOpen && thirdFingerIsOpen && fourthFingerIsOpen && this->isThumbNearFirstFinger(landmarkList.landmark(4), landmarkList.landmark(8)))
{
LOG(INFO) << "OK!";
}
else
{
LOG(INFO) << "Finger States: " << thumbIsOpen << firstFingerIsOpen << secondFingerIsOpen << thirdFingerIsOpen << fourthFingerIsOpen;
LOG(INFO) << "___";
}
return ::mediapipe::OkStatus();
} // namespace mediapipe
} // namespace mediapipe

We have to add the HandGestureRecognitionCalculator node config in the in the hand_landmark_cpu.pbtxt or hand_landmark_gpu.pbtxt graph file.

  node {
      calculator: "HandGestureRecognitionCalculator"
      input_stream: "NORM_LANDMARKS:scaled_landmarks"
      input_stream: "NORM_RECT:hand_rect_for_next_frame"
    }

For example:

  1. in the hand_landmark_cpu.pbtx see here: https://github.com/TheJLifeX/mediapipe/blob/master/mediapipe/graphs/hand_tracking/subgraphs/hand_landmark_cpu.pbtxt#L187-L191
  2. in the hand_landmark_gpu.pbtx see here: https://github.com/TheJLifeX/mediapipe/blob/master/mediapipe/graphs/hand_tracking/subgraphs/hand_landmark_gpu.pbtxt#L182-L186

We have to create a bazel build config for our Calculator.

cc_library(
name = "hand-gesture-recognition-calculator",
srcs = ["hand-gesture-recognition-calculator.cc"],
visibility = ["//visibility:public"],
deps = [
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework/formats:landmark_cc_proto",
"//mediapipe/framework/port:status",
"//mediapipe/framework/formats:rect_cc_proto",
"//mediapipe/framework/port:ret_check",
],
alwayslink = 1,
)

We have to add the path to the "hand-gesture-recognition-calculator" bazel build config in the hand_landmark_cpu or hand_landmark_cpu bazel build config.

For example: "//hand-gesture-recognition:hand-gesture-recognition-calculator"

  1. in the hand_landmark_cpu see here: https://github.com/TheJLifeX/mediapipe/blob/a069e5b6e1097f3f69c161a11f336e9e3b9751dd/mediapipe/graphs/hand_tracking/subgraphs/BUILD#L88
  2. in the hand_landmark_gpu see here: https://github.com/TheJLifeX/mediapipe/blob/a069e5b6e1097f3f69c161a11f336e9e3b9751dd/mediapipe/graphs/hand_tracking/subgraphs/BUILD#L192

You can now build the project and run it.

@FabricioGuimaraesOliveira

Hello. How can I integrate media pipe pose and a multitracking hands in real time? Its possible?

@johntranz
Copy link

@TheJLifeX,
Can you instruct me to train the sign language recognition model ?
Or steps taken to let the model recognize a new gesture.
I would be very grateful for that

@tonywang531
Copy link

Since mediapipe can now detect left or right hand, how to implement this feature? I got this code working in IOS, if anyone is interested I will upload it in the mediapipe repository.

@happyCodingSusan
Copy link

@TheJLifeX,
Thank you for sharing your ideas and codes.
I found you simple gestures work well when I use my right hand but not work when I use my left hand. When I use my left hand, so many mistakes. For example, when my left hand shows 5 fingers, your detection result is 4. Do you encounter this issue before? Any ideas why this happens?
Many thanks.

@tonywang531
Copy link

tonywang531 commented Nov 4, 2020

@happyCodingSusan,
The detection algorithm is very basic in the example. It does not take into account of left or right hand. In order to properly recognize both hands, the left or right hand information need to be retrieved from mediapipe landmark data. So the logic would be something like if the detected hand is left, flip the positions of the fingers then the detection algorithm will work. I am still trying to work out how to access this information.
At the worst case, we could identify left or right hand by using landmark data alone. Take for example, landmark position 4 and position 10, if 4 is on the left of 10 then it is like the palm in the example picture and vice versa. By checking the positions of the 5 fingers we should be able to detect if it is a left hand or right hand. It's just that I prefer not to reinvent the wheel when Mediapipe already can detect left or right hand in the current version.
I also give a bit thought on the front of the hand or the back of the hand. It seems to be impossible to detect if it is the front of the hand and the back of the hand by using landmark data alone. What I mean is that left hand with finger lines facing your face would be detected the same as right hand facing your face, sort like when you clap your hands. I don't feel you can distinguish this by using mediapipe. Therefore you would need to specify which side of the hand in your application.

@adarshx11
Copy link

@TheJLifeX Help me,
Can you Do it with Python .py

@adarshx11
Copy link

@TheJLifeX

Thank you very much. you save my time a lot.

I have tried the same and working fine but finger landmark detection is NOT good.

image

Here I am trying to detect the index finger is touching the thumb finger or not. but since the index finger landmark point is not always correct my prediction going wrong.

could you please advise in this part?

@TheJLifeX

Thank you very much. you save my time a lot.

I have tried the same and working fine but finger landmark detection is NOT good.

image

Here I am trying to detect the index finger is touching the thumb finger or not. but since the index finger landmark point is not always correct my prediction going wrong.

could you please advise in this part?

Saddam share Code Please

@denzero13
Copy link

Hi @TheJLifeX, I have a problem, I want to use the MediaPipe Hands library for python, I display a list of positions, but I don't understand which position corresponds to the point I need, and some positions are repeated, I used your code from above, but it doesn't work, I created a function that generates the required list. I output the frozen set mp_hands.HAND_CONNECTIONS and get the result
How to decipher correctly to understand which position belongs to which point?
sorry for bad english, i use google translator)))

frozenset({(<HandLandmark.THUMB_IP: 3>, <HandLandmark.THUMB_TIP: 4>), (<HandLandmark.WRIST: 0>, <HandLandmark.INDEX_FINGER_MCP: 5>), (<HandLandmark.PINKY_MCP: 17>, <HandLandmark.PINKY_PIP: 18>), (<HandLandmark.WRIST: 0>, <HandLandmark.PINKY_MCP: 17>), (<HandLandmark.RING_FINGER_MCP: 13>, <HandLandmark.RING_FINGER_PIP: 14>), (<HandLandmark.RING_FINGER_MCP: 13>, <HandLandmark.PINKY_MCP: 17>), (<HandLandmark.PINKY_PIP: 18>, <HandLandmark.PINKY_DIP: 19>), (<HandLandmark.INDEX_FINGER_MCP: 5>, <HandLandmark.INDEX_FINGER_PIP: 6>), (<HandLandmark.INDEX_FINGER_MCP: 5>, <HandLandmark.MIDDLE_FINGER_MCP: 9>), (<HandLandmark.RING_FINGER_PIP: 14>, <HandLandmark.RING_FINGER_DIP: 15>), (<HandLandmark.WRIST: 0>, <HandLandmark.THUMB_CMC: 1>), (<HandLandmark.MIDDLE_FINGER_MCP: 9>, <HandLandmark.MIDDLE_FINGER_PIP: 10>), (<HandLandmark.THUMB_CMC: 1>, <HandLandmark.THUMB_MCP: 2>), (<HandLandmark.MIDDLE_FINGER_PIP: 10>, <HandLandmark.MIDDLE_FINGER_DIP: 11>), (<HandLandmark.MIDDLE_FINGER_MCP: 9>, <HandLandmark.RING_FINGER_MCP: 13>), (<HandLandmark.PINKY_DIP: 19>, <HandLandmark.PINKY_TIP: 20>), (<HandLandmark.INDEX_FINGER_PIP: 6>, <HandLandmark.INDEX_FINGER_DIP: 7>), (<HandLandmark.RING_FINGER_DIP: 15>, <HandLandmark.RING_FINGER_TIP: 16>), (<HandLandmark.THUMB_MCP: 2>, <HandLandmark.THUMB_IP: 3>), (<HandLandmark.MIDDLE_FINGER_DIP: 11>, <HandLandmark.MIDDLE_FINGER_TIP: 12>), (<HandLandmark.INDEX_FINGER_DIP: 7>, <HandLandmark.INDEX_FINGER_TIP: 8>)})

@Jaguaribe21
Copy link

Jaguaribe21 commented Dec 8, 2020

Hi,

Has anyone managed to implement in mediapipe 0.8.0?

I tried to modify the files, but when I run, it displays this error:

AnnotationOverlayCalculator :: GetContract failed to validate:
For input streams ValidatePacketTypeSet failed:
"INPUT_FRAME" tag index 0 was not expected.
For output streams ValidatePacketTypeSet failed:
"OUTPUT_FRAME" tag index 0 was not expected.

Any tips or solutions?

Note: I am using Bazel 3.4.1, it runs normally, without the codes to be deployed.

Thanks.

@tonywang531
Copy link

Sorry for the delay. I have uploaded the IOS version handtracking to this location:
https://github.com/tonywang531/Temporary-code/tree/master/IOS
Disclaimer: Please just use it as a reference material and work from there. I no longer work on Mediapipe because I used Apple Vision framework directly. The big reason is that Mediapipe does not support Swift, which is the main thing used by Apple. Objective C is hard to program and does not work well with Xcode. By this point I probably forgot what I have wrote before and unable to answer questions regarding to this code.

@czming
Copy link

czming commented Feb 7, 2021

Hi, I managed to get the code to build with the new calculator and subgraph but seem to be getting this error when trying to run the hand_tracking_cpu code after inserting the subgraph, do you know of any possible solutions? Thanks!

~/mediapipe$ GLOG_logtostderr=1 bazel-bin/mediapipe/examples/desktop/hand_tracking/hand_tracking_cpu --calculator_graph_config_file=mediapipe/graphs/hand_tracking/hand_tracking_desktop_live.pbtxt
I20210207 20:12:09.581851 163311 demo_run_graph_main.cc:47] Get calculator graph config contents: # MediaPipe graph that performs hands tracking on desktop with TensorFlow

... truncated (same as values in mediapipe/graphs/hand_tracking/hand_tracking_desktop_live.pbtxt on the Mediapipe's github repo)

node {
calculator: "HandLandmarkSubgraph"
input_stream: "IMAGE:input_video"
input_stream: "NORM_RECT:hand_rect"
output_stream: "LANDMARKS:hand_landmarks"
output_stream: "NORM_RECT:hand_rect_for_next_frame"
output_stream: "PRESENCE:hand_presence"
output_stream: "RECOGNIZED_HAND_GESTURE:recognized_hand_gesture"
}
I20210207 20:12:09.582515 163311 demo_run_graph_main.cc:53] Initialize the calculator graph.
E20210207 20:12:09.584162 163311 demo_run_graph_main.cc:153] Failed to run the graph: ValidatedGraphConfig Initialization failed.
HandGestureRecognitionCalculator::GetContract failed to validate:
For output streams ValidatePacketTypeSet failed:
Tag "RECOGNIZED_HAND_GESTURE" index 0 was not expected.

@adarshx11
Copy link

how to print Normalized Keypoint using Python ?
image

@Rudrrr
Copy link

Rudrrr commented Feb 27, 2021

Can this project run on single raspberry pi ?

@atharvakale31
Copy link

Hey, I created a desktop application using some of your logic do check out.

https://github.com/atharvakale31/Custom_Gesture_Control

UI

eg: Control power point slide

@LexQzim
Copy link

LexQzim commented Mar 23, 2021

Some people are asking for a python implementation. This is my python interpretaion of @TheJLifeX tutorial. Maybe it helps someone:

Thank you @TheJLifeX for your nice simple tutorial.

        import math
        import mediapipe as mp
        import cv2
        
        class SimpleGestureDetector:
            # region: Member variables
            # mediaPipe configuration hands object
            __mpHands = mp.solutions.hands
            # mediaPipe detector objet
            __mpHandDetector = None
        
            def __init__(self):
                self.__setDefaultHandConfiguration()
        
            def __setDefaultHandConfiguration(self):
                self.__mpHandDetector = self.__mpHands.Hands(
                    # default = 2
                    max_num_hands=2,
                    # Minimum confidence value ([0.0, 1.0]) from the landmark-tracking model for the hand landmarks to be considered tracked successfully (default= 0.5)
                    min_detection_confidence=0.5,
                    # Minimum confidence value ([0.0, 1.0]) from the hand detection model for the detection to be considered successful. (default = 0.5)
                    min_tracking_confidence=0.5
                )
        
        
            def __getEuclideanDistance(self, posA, posB):
                return math.sqrt((posA.x - posB.x)**2 + (posA.y - posB.y)**2)
        
            def __isThumbNearIndexFinger(self, thumbPos, indexPos):
                return self.__getEuclideanDistance(thumbPos, indexPos) < 0.1
        
        
            def detectHands(self, capture):
                if self.__mpHandDetector is None:
                    return
        
                image = capture.color
                # Input image must contain three channel rgb data.
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                # lock image for hand detection
                image.flags.writeable = False
                # start handDetector on current image
                detectorResults = self.__mpHandDetector.process(image)
                # unlock image
                image.flags.writeable = True
        
                if detectorResults.multi_hand_landmarks:
                    for handLandmarks in detectorResults.multi_hand_landmarks:
                        self.simpleGesture(handLandmarks.landmark)
        
            def simpleGesture(self, handLandmarks):
        
                thumbIsOpen = False
                indexIsOpen = False
                middelIsOpen = False
                ringIsOpen = False
                pinkyIsOpen = False
        
                pseudoFixKeyPoint = handLandmarks[2].x
                if handLandmarks[3].x < pseudoFixKeyPoint and handLandmarks[4].x < pseudoFixKeyPoint:
                    thumbIsOpen = True
        
                pseudoFixKeyPoint = handLandmarks[6].y
                if handLandmarks[7].y < pseudoFixKeyPoint and handLandmarks[8].y < pseudoFixKeyPoint:
                    indexIsOpen = True
        
                pseudoFixKeyPoint = handLandmarks[10].y
                if handLandmarks[11].y < pseudoFixKeyPoint and handLandmarks[12].y < pseudoFixKeyPoint:
                    middelIsOpen = True
        
                pseudoFixKeyPoint = handLandmarks[14].y
                if handLandmarks[15].y < pseudoFixKeyPoint and handLandmarks[16].y < pseudoFixKeyPoint:
                    ringIsOpen = True
        
                pseudoFixKeyPoint = handLandmarks[18].y
                if handLandmarks[19].y < pseudoFixKeyPoint and handLandmarks[20].y < pseudoFixKeyPoint:
                    pinkyIsOpen = True
        
                if thumbIsOpen and indexIsOpen and middelIsOpen and ringIsOpen and pinkyIsOpen:
                    print("FIVE!")
        
                elif not thumbIsOpen and indexIsOpen and middelIsOpen and ringIsOpen and pinkyIsOpen:
                    print("FOUR!")
        
                elif not thumbIsOpen and indexIsOpen and middelIsOpen and ringIsOpen and not pinkyIsOpen:
                    print("THREE!")
        
                elif not thumbIsOpen and indexIsOpen and middelIsOpen and not ringIsOpen and not pinkyIsOpen:
                    print("TWO!")
        
                elif not thumbIsOpen and indexIsOpen and not middelIsOpen and not ringIsOpen and not pinkyIsOpen:
                    print("ONE!")
        
                elif not thumbIsOpen and indexIsOpen and not middelIsOpen and not ringIsOpen and pinkyIsOpen:
                    print("ROCK!")
        
                elif thumbIsOpen and indexIsOpen and not middelIsOpen and not ringIsOpen and pinkyIsOpen:
                    print("SPIDERMAN!")
        
                elif not thumbIsOpen and not indexIsOpen and not middelIsOpen and not ringIsOpen and not pinkyIsOpen:
                    print("FIST!")
        
                elif not indexIsOpen and middelIsOpen and ringIsOpen and pinkyIsOpen and self.__isThumbNearIndexFinger(handLandmarks[4], handLandmarks[8]):
                    print("OK!")
        
                 print("FingerState: thumbIsOpen? " + str(thumbIsOpen) + " - indexIsOpen? " + str(indexIsOpen) + " - middelIsOpen? " +
                       str(middelIsOpen) + " - ringIsOpen? " + str(ringIsOpen) + " - pinkyIsOpen? " + str(pinkyIsOpen))

@adarshx11
Copy link

image

I can Predict the Sign Gesture from A to Z and numerics also but this is like Static I've input the image file which having gesture to the model (KNN) but i want to Predict in realtime like through webcam How Can I do it?? anyone help ..... @TheJLifeX

@Jagadishrathod
Copy link

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Traceback (most recent call last):
File "project.py", line 80, in
recognizedHandGesture = recognizeHandGesture(getStructuredLandmarks(results.multi_hand_landmarks))
File "project.py", line 67, in getStructuredLandmarks
structuredlandmarks.append({ 'x': hand_landmarks[j], 'y': hand_landmarks[j] })
IndexError: list index out of range

@yanzhoupan
Copy link

for landmarks in detectorResults.multi_hand_landmarks:
        self.simpleGesture(landmarks)

gives me a type error
TypeError: 'NormalizedLandmarkList' object does not support indexing

It should be:

for landmarks in detectorResults.multi_hand_landmarks:
        self.simpleGesture(landmarks.landmark)

@LexQzim
Copy link

LexQzim commented Apr 7, 2021

for landmarks in detectorResults.multi_hand_landmarks:
        self.simpleGesture(landmarks)

gives me a type error
TypeError: 'NormalizedLandmarkList' object does not support indexing

It should be:

for landmarks in detectorResults.multi_hand_landmarks:
        self.simpleGesture(landmarks.landmark)

Sry, my bad.
Fixed my comment.
Thank you

@titoasty
Copy link

Nice job!
I've been able to easily detect gestures thanks to you ;)

Has any of you been able to detect hand orientation???
I'm desperately trying to compute it...
Thanks for your help!

@FarhanAhmadISM
Copy link

@TheJLifeX the link you posted above for building this repo is not working. Could you please look into it and help me how to build and run this.

@TheJLifeX
Copy link
Author

Hi @FarhanAhmadISM, the MediaPipe documentation website has changed (from https://mediapipe.readthedocs.io to https://google.github.io/mediapipe). Please visit the following links to get started with MediaPipe:

  1. Get Started with MediaPipe
  2. Hand Tracking on Desktop

@FarhanAhmadISM
Copy link

@TheJLifeX I am facing some issues to run your Hand-gesture-recognition repositories. Can you please tell me the exact commands for running your repository.
Like for hello world we need to write
bazel run --define MEDIAPIPE_DISABLE_GPU=1
mediapipe/examples/desktop/hello_world:hello_world
So for your repo to run please tell me step by step. I shall be grateful to you

@RohitSingh1226
Copy link

@TheJLifeX Will this run on windows
I am getting the following error

'build' options: --jobs 128 --define=absl=1 --cxxopt=-std=c++14 --copt=-Wno-sign-compare --copt=-Wno-unused-function --copt=-Wno-uninitialized --copt=-Wno-unused-result --copt=-Wno-comment --copt=-Wno-return-type --copt=-Wno-unused-local-typedefs --copt=-Wno-ignored-attributes --incompatible_disable_deprecated_attr_params=false --incompatible_depset_is_not_iterable=false --apple_platform_type=macos --apple_generate_dsym
ERROR: --incompatible_disable_deprecated_attr_params=false :: Unrecognized option: --incompatible_disable_deprecated_attr_params=false

@RohitSingh1226
Copy link

Updated Error

@TheJLifeX Will this run on windows
I am getting the following error

'build' options: --jobs 128 --define=absl=1 --cxxopt=-std=c++14 --copt=-Wno-sign-compare --copt=-Wno-unused-function --copt=-Wno-uninitialized --copt=-Wno-unused-result --copt=-Wno-comment --copt=-Wno-return-type --copt=-Wno-unused-local-typedefs --copt=-Wno-ignored-attributes --incompatible_disable_deprecated_attr_params=false --incompatible_depset_is_not_iterable=false --apple_platform_type=macos --apple_generate_dsym
ERROR: --incompatible_disable_deprecated_attr_params=false :: Unrecognized option: --incompatible_disable_deprecated_attr_params=false

Updated error:
mediapipe-master/mediapipe/framework/deps/BUILD:193:1: C++ compilation of rule '//mediapipe/framework/deps:registration_token' failed (Exit 2)
cl : Command line error D8021 : invalid numeric argument '/Wno-sign-compare'

@pkoppise
Copy link

pkoppise commented Nov 29, 2021

Hi @TheJLifeX

I have ported the above gist to the latest mediapipe hand tracking gpu as follows

ubuntu20@ubuntu20-OptiPlex-9020:~/mediapipe/hand-gesture-recognition$ ls
BUILD hand-gesture-recognition-calculator.cc


annotation_overlay_calculator.cc

....
....
constexpr char recognizedHandGestureTag[] = "RECOGNIZED_HAND_GESTURE";

absl::Status AnnotationOverlayCalculator::GetContract(CalculatorContract* cc) {
+RET_CHECK(cc->Inputs().HasTag(recognizedHandGestureTag)); 
+cc->Inputs().Tag(recognizedHandGestureTag).Set<std::string>();

 absl::Status AnnotationOverlayCalculator::Process(CalculatorContext* cc) {
....
.....
+const auto &recognizedHandGesture = cc->Inputs().Tag(recognizedHandGestureTag).Get<std::string>();
+renderer_->DrawText(recognizedHandGesture);

mediapipe/graphs/hand_tracking/hand_tracking_desktop_live_gpu.pbtxt

node {
calculator: "HandLandmarkTrackingGpu"
....
....
+output_stream: "RECOGNIZED_HAND_GESTURE:recognized_hand_gesture"  

node {
calculator: "HandRendererSubgraph" 
....
....
+input_stream: "RECOGNIZED_HAND_GESTURE:recognized_hand_gesture"

mediapipe/graphs/hand_tracking/subgraphs/hand_renderer_gpu.pbtxt

....
....
+input_stream: "RECOGNIZED_HAND_GESTURE:recognized_hand_gesture"

node {
calculator: "AnnotationOverlayCalculator"
....
....
+input_stream: "RECOGNIZED_HAND_GESTURE:recognized_hand_gesture"

mediapipe/modules/hand_landmark/hand_landmark_gpu.pbtxt
....
....
+node {
+calculator: "HandGestureRecognitionCalculator"
+input_stream: "NORM_LANDMARKS:scaled_landmarks"
+input_stream: "NORM_RECT:hand_rect_for_next_frame"
+}


mediapipe/modules/hand_landmark/hand_landmark_tracking_gpu.pbtxt
....
....
+output_stream: "RECOGNIZED_HAND_GESTURE:recognized_hand_gesture"

+node {
+calculator: "HandGestureRecognitionCalculator"
+output_stream: "RECOGNIZED_HAND_GESTURE:recognized_hand_gesture"
+}

mediapipe/modules/hand_landmark/BUILD

mediapipe_simple_subgraph(
name = "hand_landmark_tracking_gpu",
....
....
+"//hand-gesture-recognition:hand-gesture-recognition-calculator",

mediapipe/util/annotation_renderer.h

+void DrawText(std::string text);

mediapipe/util/annotation_renderer.cc

+void AnnotationRenderer::DrawText(std::string text)  
+{
+const int left = 275;
+const int top = 50;
+const cv::Point origin(left, top);
+const int font_size = 35;
+ const int thickness = 5;
+const cv::Scalar color = cv::Scalar(255.0, 0.0, 0.0);
+const cv::HersheyFonts font_face = cv::FONT_HERSHEY_PLAIN;
+const double font_scale = ComputeFontScale(font_face, font_size, thickness);
+cv::putText(mat_image_, text, origin, font_face, font_scale, color, thickness);
+}

got below error

E20211130 02:20:28.672209 113817 demo_run_graph_main_gpu.cc:197] Failed to run the graph: ValidatedGraphConfig Initialization failed.
HandGestureRecognitionCalculator: ; cc->Inputs().HasTag(normalizedLandmarkListTag)nd-gesture-recognition-calculator.cc:49)
HandGestureRecognitionCalculator: ; cc->Outputs().HasTag(recognizedHandGestureTag)nd-gesture-recognition-calculator.cc:55)

49---->RET_CHECK(cc->Inputs().HasTag(normalizedLandmarkListTag));
55--->RET_CHECK(cc->Outputs().HasTag(recognizedHandGestureTag));

Could you please provide any inputs on resolving the issue?

Thanks in advance

@pkoppise
Copy link

I20211130 15:43:17.350132 159695 demo_run_graph_main_gpu.cc:58] Initialize the calculator graph.
E20211130 15:43:17.352228 159695 demo_run_graph_main_gpu.cc:197] Failed to run the graph: ; Input Stream "recognized_hand_gesture" for node with sorted index 50 does not have a corresponding output stream.

@KarinaKatke
Copy link

I20211130 15:43:17.350132 159695 demo_run_graph_main_gpu.cc:58] Initialize the calculator graph. E20211130 15:43:17.352228 159695 demo_run_graph_main_gpu.cc:197] Failed to run the graph: ; Input Stream "recognized_hand_gesture" for node with sorted index 50 does not have a corresponding output stream.

I have the same Problem. Did you find a way to solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment