- ml5 HandPose
- Building an ml5 HandPose + Arduino app: HandWaver
- Building the web app side
- Building the Arduino side
- Final construction
In our previous lesson, we introduced combining Arduino with machine learning (ML) libraries like ml5.js, a web-based ML library built on Google TensorFlow. Specifically, we built a p5.js app that fed a real-time web cam stream into ml5’s PoseNet to identify and classify human body parts (keypoints) and sent identified keypoints to our Arduino to create new interactive experiences.
In this lesson, we will introduce a new ml5 model, called Handpose, which precisely tracks the hand and 20 finger keypoints in 3-dimensions, and use it to control a servo motor. This lesson should further advance your understanding of using ml5, how to modularize and build an ml5+Arduino app step-by-step, and hopefully also inspire you to think about how we can combine real-time ML with Arduino.
In March 2020, the Google TensorFlow.js team released two incredible packages for web-based face and hand tracking, entitled FaceMesh (now face-landmarks-detection) and HandPose, respectively. Soon thereafter a user made a new feature request to support these new packages with ml5. By November 2020, it was implemented into ml5 by Bomani Oseni McClendon as part of the ml5.js Fellows Program).
In this lesson, we will focus on HandPose rather than FaceMesh (though both are available in ml5). You are welcome to use either the TensorFlow.js implementation, Google’s MediaPipe version, or ml5’s version. All three implementations use the same underlying pre-trained ML model. For this lesson, we will use ml5’s HandPose. Here are some example demos across the three implementations, which run in your web browser:
- Google MediaPipe’s Hand Tracking Demo
- Google MediaPipe’s Demo App: Hand Defrosting
- Google TensorFlow’s HandPose Demo
- ml5 HandPose Demo in p5.js web editor
In 2019, research scientists Margaret Mitchell, Timnit Gebru, and colleagues published a paper entitled Model Cards for Model Reporting, which called for ML-based APIs to provide transparent information about how the underlying ML model in the API was trained and expected usage contexts. The paper begins with important motivation that emphasizes how ML is beginning to permeate every aspect of life with serious ramifications:
Trained machine learning models are increasingly used to perform high-impact tasks in areas such as law enforcement, medicine, education, and employment. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited, we recommend that released models be accompanied by documentation detailing their performance characteristics.
They then propose a framework called “model cards” to standardize how ML models are reported by companies:
In this paper, we propose a framework that we call model cards, to encourage such transparent model reporting. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type ) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information.
This paper and the respective research scientists who authored it have made a significant impact on the ML community. As a testament, many of the Google ML APIs and models now provide “model cards”. Here’s the model card for HandPose (local copy)—notably, I could not find one for PoseNet.
We summarize a few important HandPose model notes below.
HandPose consists of two lightweight models, a palm detector and a hand landmark model, to detect and classify keypoints on the hand. The model inputs an image or video frame, resizes that input to 256x256 for recognition, and outputs:
- a palm bounding box,
- 21 3-dimensional hand landmarks (keypoints), and
- an overall confidence score for the hand detection
The 21 keypoints include four each for the
pinky plus one more for the
Figure. The HandPose keypoints from the MediaPipe team.
The actual keypoint indices from the TensorFlow implementation, which ml5 uses:
According to the TensorFlow team, HandPose is well-suited for real-time inference across a variety of devices, achieving 40 FPS on a 2018 MacBook Pro, 35 FPS on an iPhone11, and 6 FPS on a Pixel3.
In terms of limitations and ethical considerations, the HandPose model card specifies that the HandPose models have been trained on a limited dataset and are not appropriate for counting the number of hands in a crowd, detecting hands with gloves or occlusions, or detecting hands that are far from the camera (greater than ~2 meters).
Moreover, the model card makes clear that the HandPose model is not intended for life-critical decisions and that performance will vary across skin tones, gender, age, and environmental conditions (e.g., low light).
Importantly, just as PoseNet, which we used in the previous lesson, detects body pose keypoints but does not attempt to recognize who is in an image, HandPose similarly performs detection but does not attempt recognition (that is, who owns the detected hand). In computer vision, there is an important difference between detection and recognition. All detections occur locally in the user’s web browser (and not in the cloud).
Just like with PoseNet, the TensorFlow and ml5 HandPose APIs use the same data structure. The model returns an array of objects describing each detected hand (always one in ml5’s case, currently). Each “hand” object includes four things:
handInViewConfidence, which is the model’s confidence that the hand actually exists
boundingBox, which provides the
bottomRightx,y positions of the detected hand
landmarksarray, which includes the 3D (x,y,z) coordinates of each hand landmark (keypoint)
annotationsarray, which provides the same 3D coordinates as
landmarksbut semantically grouped into
The array structure looks like this:
To make this more clear, here’s a screenshot from Chrome’s dev tools showing the
predictions array (which, again, will always be size 1 because ml5 is currently limited to detecting one simultaneous hand). In the screenshot, I’ve expanded the array to show the aforementioned high-level structure of
Figure. This figure shows a screenshot of the HandPose
predictions array and underlying objects as shown in Chrome’s dev tools. Right-click and select “Open Image in New Tab” to enlarge. The app running here is our HandPoseDemo. You can also explore the model interactively: run HandPoseDemo, open
sketch.js in Sources, put a breakpoint on the
onNewHandPosePrediction() function, and add the
predictions array to the
Watch list. Exploring data structures like this can help advance understanding—and is a great strategy for web dev.
To demonstrate the ml5.js HandPose API and how to step through the data structure, we created a simple application called HandPoseDemo that renders:
boundingBoxreturned from the API along with a “tighter” version that we manually calculate based on keypoints
handInViewConfidencescore, which we draw above the “tight” bounding box
- the 21
landmarks(keypoints) for the
palmBasealong with text labels
This data structure is similar but not identical to PoseNet—one key difference is that unlike PoseNet, the individual keypoints do not include specific confidence scores. Here’s a quick video demo.
To help highlight the potential of real-time ML plus Arduino, we will build a simple “robotic” hand waver. We will use ml5’s HandPose API to sense the user’s hand, which will then control a servo motor embedded on a cardboard-crafted figure. See sneak preview below.
- If you’re using VSCode, copy
SerialTemplateand rename the folder to
- If you’re using the p5.js online editor, simply open Serial Template and rename your project to
The ml5 library generally aims to create consistency across their APIs. Thus, the ml5 HandPose API should feel familiar if you followed our previous PoseNet lesson. Similar to PoseNet, the
ml5.handpose constructor takes in three optional arguments
callback (indicated by the
options: An optional object of PoseNet configuration properties. See below.
callback: An optional reference to a callback function, which is called when the model is loaded.
options are listed below (with defaults shown). You can and should play with these options based on the needs of your application.
See also: the TensorFlow documentation here.
So, to initialize and create a
ml5.handpose object, we write:
Again, this should feel familiar! It’s quite similar to our PoseNet lesson thus far.
Also like PoseNet, we can subscribe to a “new pose event” via the
on function by passing the
predict event name:
So, our full initialization + subscription HandPose code is:
You can view, play with, and edit this code in the p5.js online editor. But there’s not much there yet!
Now, the fun part! Let’s add drawing code to render three things:
- the 21 HandPose keypoints as circles (in a new function called
- a bounding box with overall hand confidence score (in a function called
- some convenience text to tell the user about model initialization (“Waiting for model to load…”).
First, let’s update the
draw() function to show some convenience text when the model is still loading and call drawing functions for the hand keypoints and bounding box (if the hand was detected):
It should look something like this:
Figure. Showing what “Waiting for HandPose model to load…” text looks like in the p5.js editor.
Now, let’s add the
drawHand(handPose) function. We will iterate through all 21 landmarks (keypoints) and draw a green circle at their x,y position (stored in
landmark index 0 and 1 respectively).
Your hand should now have green circles drawn on the landmarks like this:
Figure. Drawing the keypoints on the hand. Screenshot from the p5.js editor.
Lastly, let’s add a
drawBoundingBox(handPose) function that renders a rectangle for the HandPose
boundingBox object along with its
Here’s a screenshot with the keypoints, bounding box, and confidence:
Figure. Drawing the keypoints, the bounding box, and the hand confidence score. Screenshot from the p5.js editor.
You can view, edit, and play with this code in the p5.js online editor.
For the final step, we’ll add in code to transmit the
palmBase normalized x position [0, 1] via web serial. To avoid saturating web serial with data, we will also limit our transmission rate to ~20Hz (one transmission every 50ms). Lastly, let’s also add in drawing code to show
palmBase information to the screen (useful for debugging!).
First, add in a global variable:
Then update the
onNewHandPosePrediction function to calculate and transmit
Finally, update the
draw() function to draw
palmBase info to the screen:
And that’s it! Because our
SerialTemplate already supports connecting to a serial device by clicking on the canvas (by default) and/or auto-connecting to previously approved web serial devices, we are all set. Feel free to add in your own connection code (e.g., a specific “Connect Button” for web serial). The full code is here.
Now on to the Arduino side!
We’re going to build up the Arduino side step-by-step. There are five main steps:
- Create an initial servo motor circuit and Arduino test program
- Create a simple p5.js + servo test app with web serial
- Create an interesting lo-fi form for our embedded servo motor
- Test the form and our servo motor circuit
- Create the end-to-end HandPose + Arduino system
As a quick introduction to servo motors, please read this Adafruit lesson by Simon Monk. Building on that lesson, we’ll create a basic circuit that allows a user to control the servo motor position with a potentiometer. More specifically, we’ll read in the potentiometer value on Pin
analogRead(), convert it to an angle between 0 - 180, and then write out the angle to the servo motor.
Figure. Basic servo motor circuit with servo pulse pin hooked to Arduino’s Pin 9 and the potentiometer hooked to Pin
A0. Diagram made in Fritzing and PowerPoint.
The code, in full, is:
Code. This code is in our GitHub as ServoPot.ino.
Here’s a video demonstration showing a slightly modified Arduino circuit and sketch (called ServoPotOLED.ino). The only difference is that the OLED version outputs the current servo angle on the OLED display.
Video. A demonstration of the servo circuit with potentiometer. The video is showing ServoPotOLED.ino, which is functionally equivalent to the code above (ServoPot.ino) but includes OLED support. Here, the OLED displays the current servo angle.
Let’s update our code to set the servo motor angle based on serial input rather than the potentiometer. We’re going to write slightly more flexible parsing code than usual. In this case, we’ll accept either line delimited strings of integer values ranging from 0 - 180, inclusive, or float values ranging from 0-1, inclusive. We’ll determine whether the serial transmitter sent an integer vs. a float by looking for a decimal point in the string.
The full code:
Code. The full code is here ServoSerialIn.ino.
Video. A demonstration of controlling the servo motor from serial input. This video is using a slightly modified sketch with OLED support called ServoSerialInOLED.ino but is functionally equivalent to ServoSerialIn.ino.
We also made a slightly more sophisticated version that allows the user to choose between whether to use the potentiometer or serial input to control the servo motor: ServoPotWithSerialIn.ino and ServoPotWithSerialInOLED.ino. You can toggle between potentiometer vs. serial input using the button.
Video. A demonstration of ServoPotWithSerialInOLED.ino. You can use the button to change between two input modes to control the servo motor: the potentiometer and serial input. In the video, note how we press the button to switch between potentiometer-based control and serial control. For the latter, we send new values using Serial Monitor. We also created a non-OLED version of the code called ServoPotWithSerialIn.ino.
To more easily test our Arduino sketch with p5), let’s build a simple web serial app to control the servo through the web browser. In this case, we’ll read the
x position of the mouse, normalize it to [0, 1], and transmit it over serial. If this works, then the final step will be to integrate our HandWaver app—which should be straightforward.
Start by making a of copy
SerialTemplate, if you’re using VSCode, or Serial Template, if you’re using p5.js. Rename your project to something like
XMouseSerialOut—but the name is up to you, of course.
Now, we need to implement three things:
- Sense and normalize the
xmouse position. This is easy, we can always grab the current
xmouse position using the global
mouseXvariable in p5.js and the
mouseMoved()function is called whenever the user’s mouse moves
- Transmit the normalized
xposition over web serial
- Draw x mouse information to canvas. This is optional but useful.
The p5.js function
mouseMoved() is called every time the mouse moves (as long as the mouse button is not pressed). Let’s put our mouse-related code there.
First, create two global variables for mouse tracking:
Now, implement the
Finally, add in drawing code to display a gray line for the current x mouse position and large text for the normalized value:
Video. A demonstration of a small p5.js test app called XMouseSerialOut (code), which outputs a normalized mouse
x position to serial. Code running on Arduino is ServoPotWithSerialInOLED.ino but many other programs in our GitHub repo would work like ServoSerialIn.
If the simple p5.js x-position web app works with your Arduino sketch, then the HandWaver app should too. So, return to your HandWaver code—here’s our version on the p5.js web editor and on GitHub (live page, code). On the Arduino, you can run any of the following previously described serial-based servo code or write your own:
- ServoSerialIn.ino or the OLED version called ServoSerialInOLED.ino, which take in either an integer value between 0 - 180 or a float value between 0 - 1 and set the servo position accordingly.
- ServoPotWithSerialIn.ino or the OLED version called ServoPotWithSerialInOLED.ino, which work similarly to the previous Arduino programs but allow the user to switch between potentiometer control and serial-based control for the servo using button input.
Now, another fun, creative part: we need to create an interesting form for the servo motor. Remember, the servo motor will move in response to your hand’s x position. So, you could:
- Create a lightsaber wielding Darth Vader
- Create a Statue of Liberty model moving her torch
- Create a cardboard-crafted LeBron James moving his arm to block Andre Iguodala in the 2016 NBA Finals (video). Now known simply as “The Block.”
- Create a cardboard-crafted Queen of England waving back at you
- … your ideas here! …
In this case, I worked with a kindergartner and preschooler to create a paper-crafted mountain scene and stick person we call “Henry, the Tape Man.”
Figure. Creating “Henry, the Tape Man” with construction paper, cardboard, glue, and lots of tape!
Then, we calculated an appropriate position to insert the servo motor for Henry’s arm and cut an inset and hole into the cardboard:
Figure. Inserting the servo motor into the cardboard backdrop.
We attached a temporary “arm” to test our construction with the potentiometer and HandWaver.
Video. Testing the servo motor embedded into the cardboard with the potentiometer—the Arduino is running ServoPotWithSerialInOLED.ino.
Now testing with HandWaver:
From these tests, we determined that a good range of motion for Henry’s arm is 40 - 85 degrees, so we updated our Arduino sketch:
And here’s the final construction running the p5+ml5 app HandWaver—available in the p5.js web editor or on GitHub (live page, code). On the Arduino, we are running ServoPotWithSerialInOLED.ino but something as simple as ServoSerialIn.ino would work (if you don’t have an OLED or don’t need/want to switch between the potentiometer and serial input to control the servo).
ml5 HandPose, ml5
TensorFlow HandPose, Google TensorFlow
Training a Hand Detector like the OpenPose one in TensorFlow, Marcelo Ortega on Medium
On-Device, Real-Time Hand Tracking with MediaPipe, Valentin Bazarevsky and Fan Zhang, Google AI Blog
Face and Hand Tracking in the Browser with MediaPipe and TensorFlow.js, Ann Yuan and Andrey Vakunov, TensorFlow Blog