Youtube as Program: Learning to Manipulate via Physical Constraints Imitation

Demo Videos

Cutting Avocado
Pouring Liquid

Rolling Dough
(left: human failure, middle: robot failure, right: robot success)


Utilizing YouTube videos for robot learning holds great promise in terms of scalability. Nevertheless, this strategy presents distinctive challenges, including the need to distill relevant information from a wide array of videos and to navigate the multifaceted nature of tasks. To address these challenges, we propose the use of a scene graph representation that interprets video demonstrations as manipulations of object attributes and relationships, grounding these physical constraints through simulation. Our approach has been validated across a spectrum of tasks, such as cutting fruit, pouring liquid, and rolling dough. Remarkably, it empowers a robot to learn from a single YouTube video and successfully apply its knowledge in a significantly different environment.