New multi-user, multi-object dataset for joint 3d hand-object pose estimation

author avatar
New multi-user, multi-object dataset for joint 3d hand-object pose estimation

Pose estimation is an important step towards understanding people in images and videos with numerous applications in action understanding, human-robot interaction, surveillance, motion capture, and more.

When it comes to hand-object pose estimation however, state-of-the-art methods still fail due to large mutual occlusions, and a lack of datasets specific to 3D pose estimation for hand+object interaction. Additionally, even when synthetic images are used for training, annotated real-world images are still needed for model validation.

Joint 3D Hand-Object Pose Estimation Dataset

Researchers recently proposed HO-3D, a large-scale dataset of diverse hand-object interaction with 3D annotations of hand and object pose. They also introduced methods to efficiently annotate and predict based on the dataset.

Example of hand and object segmentation obtained with DeepLabV3. Input image (Left); Object mask (Center); Hand mask (Right). 

HO-3D is based on global optimization that exploits depth, color, and temporal constraints for efficiently annotating the sequences, which the researchers used to train the new approach for predicting both the 3D poses of the hand and the object from a single color image. HO-3D dataset is made of RGB-D sequences of 8 different people manipulating different objects, and manual annotations inside views for evaluation of the 3D poses.

Qualitative comparison between manual annotations and our annotations using the side view camera. Manual annotations are ingrayscale, our automatic annotations in color. Top: Hand comparison. Bottom: Object comparison 

Potential Uses and Effects

Knowing that more quality data means model accuracy, HO-3D is important for enabling efficient training for the development of highly robust models. The proposed dataset is an encouragement to researchers to develop better annotation methods that can be applied to capture and easily annotate sequences with single RGB-D camera to facilitate additional training data for improved hand + object pose estimation which will inspire more efficient applications in computer vision and robotics.

Reference paper: HO-3D: A Multi-User, Multi-Object Datasetfor Joint 3D Hand-Object Pose Estimation.

Thanks for reading. Please comment, share and remember to subscribe to our weekly newsletter for the most recent and interesting research papers. You can also follow me on TwitterLinkedIn and join our Facebook Group.

More by Christopher Dossman

Deep Learning Engineer, Teacher, and Entrepreneur. To start receiving the weekly newsletter, sign up via the link below:

Wevolver 2022