Coordinated Movement has always been one of the primary challenges to tackle in robotics. Locomotion and manipulation are the two main concepts of movement. Locomotion describes the method a robot uses to transport itself from one place to another. This can take the form of legged, wheeled, or tracked motion. Manipulation, on the other hand, describes the manner in which a robot interacts with objects around it. That includes grabbing an object, picking something up, throwing something, or even opening a door.
For years, researchers and engineers have studied ways to control locomotion and manipulation simultaneously. Particularly for legged robots, attaching an arm can enable them to perform several mobile manipulation tasks that wheeled or tracked robots cannot perform. The conventional way of controlling such legged manipulators has been to control locomotion and manipulation separately - otherwise known as decoupling mobility and manipulation. While this method has its perks, it does require a lot of engineering efforts to maintain sufficient coordination between the legs and the arm.
Researchers refer to modeling an arm-carrying legged robot as “a dynamic, high-degree-of-freedom, and non-smooth control problem.” This means the complexity of such a model would limit its operation to within specified settings. Venturing into different real-life scenarios would make it increasingly complex and burdensome, and errors can arise, leading to undesirable and unnatural movements.
This has driven engineers to explore different methods that can bring synergy between the limbs in a way that resembles biologically coordinated movement. Think of the way your arms and legs move and how interconnected they are. For instance, it is difficult to move your left arm and your left leg in different motions while standing. The same applies to your right side. Or consider the activity of picking up something on the ground. Your legs and arm would work together to help you reach the object and pick it up. This is referred to as interlimb neural coupling. In simpler terms, it is called whole-body control. It allows for coordination between the limbs and the extension of the capabilities of individual parts.
In a similar sense, researchers at Carnegie Mellon University have successfully developed a unified policy that governs an arm-carrying legged robot in a whole-body control manner using the concept of reinforcement learning (RL). This allowed them to train the robot to synergize its locomotion and manipulation and perform specific tasks through a dynamic and agile behavior.
Earlier this year, Zipeng Fu, Xuxin Cheng, and Deepak Pathak from Carnegie Mellon University set out to test whether a learning-based, unified policy would outperform existing decoupling techniques in controlling a robot’s movement and ability to interact with its surrounding. As they explained, hierarchy-based decoupled and semi-coupled control methods have been ineffective due to a “lack of coordination between the arm and legs, error propagation across modules, and slow, non-smooth and unnatural motions.”
The unified policy was based on a machine-learning training concept called reinforcement learning (RL). This method simply rewards desirable behaviors and punishes undesirable ones. Through RL, the robot would learn by trial and error to perceive and analyze its environment and act accordingly.
The study comprised simulation experiments, which were then transferred into real-world experimentation. However, three distinct issues made it challenging beyond the standard RL model of training in simulation and transferring to the real world:
Multiple degrees of freedom (12 DoFs in the quadruped robot and 6 DoF in the attached arm)
Conflicting objectives in mobility and weight balance
Dependency between manipulation and locomotion
This is why Fu et al. proposed adding regularized online adaptation to bridge the so-called Sim2Real gap - i.e., the information gap between the skills acquired in simulation and those transferred to the real-world system; also referred to as the realizability gap. Regularization is a machine learning technique that helps calibrate the model and minimize the deviation between the model and the dataset, thus minimizing the realizability gap. This adaptation helps eliminate the conventional, two-phase, teacher-student scheme used in previous Sim2Real transfer applications. This is where a “teacher network” is trained by RL in simulation using full-state information, and a “student network” attempts to imitate it in real life based on partial information from onboard observations.
The researchers also tackled the dependency issue by proposing the concept of advantage mixing. Advantage functions describe how much a particular action is a good or bad decision given a specific state. Knowing that locomotion tasks mostly rely on leg activity and manipulation tasks on arm activity, mixing the advantage functions of both locomotion and manipulation can help encourage and boost the robot’s learning of the unified policy.
The researchers presented a simple design for the legged manipulator made of the following elements:
A 4-legged robot platform
A robot arm on top of the platform
An RGB camera positioned next to the gripper of the arm
An onboard battery to power the arm and the quadruped robot
With this design, they were able to show in the simulation experiments an increase in whole-body coordination as the legs helped the arm reach beyond its own workspace by bending to reach lower objects (e.g. picking up a cup) and standing up high to reach higher objects (e.g. wiping a whiteboard). Similarly, the arm helped the legs maintain balance even under relatively large disturbances (e.g. running with an initial speed of 1 m/s). This demonstrates a solid unified policy with a high survival rate.
Advantage mixing also helped the policy focus on each task and then combine them to induce a mechanism that boosts training. Furthermore, the proposed regularized adaptation resulted in improved performance and better prediction of the environment, as errors at the level of the gripper dropped by 20% compared to other systems, such as the two-phase teacher-student scheme.
As for the real-world experiments, the researchers studied three activities: teleoperation using joysticks, closed-loop control through RGB vision-guided tracking, and open-loop response to human demonstrations.
For teleoperation, the researchers used two joysticks to command the end-effector (gripper) to reach particular points even beyond the space of training. They found out that both the leg and arm joints coordinated to help the end-effector reach those points.
During vision-guided tracking, the RGB camera was used alongside the joystick control to provide visual feedback to the robot and improve its picking tasks. It showed repeated success on both easy and hard tasks.
In the open-loop control experiments, the researchers tested the coupling of the robot’s agile locomotion with its dynamic arm movement. They commanded the end-effector to follow a predefined path while the robot was walking. The results showed dynamic coordination even while walking on an uneven grass area.
The success of the unified policy opens up a new scope of coordinated robotic movement. As the researchers suggested, those results are merely preliminary. Further exciting research is expected to stem from this work, such as enabling the robot to climb on a table using its front legs to pick something up from atop the table. Another suggestion was to mount a camera onto the center of the torso, providing different visual feedback, which would pave the road for vision-based policy learning.
1. Zipeng , F., Cheng, X. and Pathak, D. (2022) “Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion ,” CoRL 2022 Conference Paper [Preprint].