When it comes to hand-object pose estimation however, state-of-the-art methods still fail due to large mutual occlusions, and a lack of datasets specific to 3D pose estimation for hand+object interaction. Additionally, even when synthetic images are used for training, annotated real-world images are still needed for model validation.
Researchers recently proposed HO-3D, a large-scale dataset of diverse hand-object interaction with 3D annotations of hand and object pose. They also introduced methods to efficiently annotate and predict based on the dataset.
Example of hand and object segmentation obtained with DeepLabV3. Input image (Left); Object mask (Center); Hand mask (Right).
HO-3D is based on global optimization that exploits depth, color, and temporal constraints for efficiently annotating the sequences, which the researchers used to train the new approach for predicting both the 3D poses of the hand and the object from a single color image. HO-3D dataset is made of RGB-D sequences of 8 different people manipulating different objects, and manual annotations inside views for evaluation of the 3D poses.
Qualitative comparison between manual annotations and our annotations using the side view camera. Manual annotations are ingrayscale, our automatic annotations in color. Top: Hand comparison. Bottom: Object comparison
Knowing that more quality data means model accuracy, HO-3D is important for enabling efficient training for the development of highly robust models. The proposed dataset is an encouragement to researchers to develop better annotation methods that can be applied to capture and easily annotate sequences with single RGB-D camera to facilitate additional training data for improved hand + object pose estimation which will inspire more efficient applications in computer vision and robotics.
Thanks for reading. Please comment, share and remember to subscribe to our weekly newsletter for the most recent and interesting research papers. You can also follow me on Twitter, LinkedIn and join our Facebook Group.