Learning Multi-Step Manipulation Tasks from
A Single Human Demonstration

Demo Videos

Same Object and Environment
Different Object and Environment


Learning from a single demonstration is promising given the difficulty in collecting sizeable robot data. However, the challenge remains to develop a robot system that matches human capabilities and data efficiency in learning and generalizability, particularly in complex, unstructured real-world scenarios. We propose a system that processes RGBD videos to translate human actions to robot primitives and identifies task-relevant key poses of objects using Grounded Segment Anything. We then address challenges for robots in replicating human actions, considering the human-robot differences in kinematics and collision geometry. To test the effectiveness of our system, we conducted experiments focusing on manual dishwashing. With a single human demonstration recorded in a mockup kitchen, the system achieved 50-100% success for each step and up to a 40% success rate for the whole task with different objects in a home kitchen.



Please cite this work if it helps your research:

Copy to Clipboard


Thank Prof. Jeff Ichnowski, Prof. Chris Atkeson, and Jianren Wang for advising on this work!


Send any comments or questions to Dingkun Guo: connect at dkguo dot com. See more on contact page.