AI Teaches Robots Tasks from a Single How-To Video

April 22, 2025

Summary: Researchers have developed RHyME, an AI-powered system that enables computers to know complicated things by watching a second individual show videos. Standard robots struggle with uncertain scenarios and require extensive education data, but RHyME allows robots to react by drawing on past movie information.

This approach bridges the gap between people and mechanical motion, enabling more flexible and effective learning through emulation. With only 30 minutes of machine information, RHyME-equipped robots achieved over 50 % higher task accomplishment than earlier approaches, marking a major move toward smarter, more worthy mechanical assistants.

Important Information:

Mismatch Solution: The program bridge differences between human and robot activities.
Successful Training: Requires just 30 minutes of machine data, boosting job success by 50 %.

Origin: Cornell University

Cornell University researchers have developed a new robotic framework powered by artificial intelligence – called RHyME ( Retrieval for Hybrid Imitation under Mismatched Execution ) – that allows robots to learn tasks by watching a single how-to video.

Robots can get picky learners. Generally, they’ve required specific, step-by-step instructions to complete simple tasks and tend to call it quits when things go off-script, like after dropping a device or losing a lock.

RHyME is the team’s answer – a scalable approach that makes robots less finicky and more adaptive. It supercharges a robotic system to use its own memory and connect the dots when performing tasks it has viewed only once by drawing on videos it has seen. Credit: Neuroscience News

RHyME, however, could fast-track the development and deployment of robotic systems by significantly reducing the time, energy and money needed to train them, the researchers said.

“One of the annoying things about working with robots is collecting so much data on the robot doing different tasks, ” said Kushal Kedia, a doctoral student in the field of computer science.

“That’s not how humans do tasks. We look at other people as inspiration. ”

Kedia will present the paper, “One-Shot Imitation under Mismatched Execution, ” in May at the Institute of Electrical and Electronics Engineers ’ International Conference on Robotics and Automation, in Atlanta.

Home robot assistants are still a long way off because they lack the wits to navigate the physical world and its countless contingencies.

To get robots up to speed, researchers like Kedia are training them with what amounts to how-to videos – human demonstrations of various tasks in a lab setting.

The hope with this approach, a branch of machine learning called “imitation learning, ” is that robots will learn a sequence of tasks faster and be able to adapt to real-world environments.

“Our work is like translating French to English – we’re translating any given task from human to robot, ” said senior author Sanjiban Choudhury, assistant professor of computer science.

This translation task still faces a broader challenge, however: Humans move too fluidly for a robot to track and mimic, and training robots with video requires gobs of it.

Further, video demonstrations – of, say, picking up a napkin or stacking dinner plates – must be performed slowly and flawlessly, since any mismatch in actions between the video and the robot has historically spelled doom for robot learning, the researchers said.

“If a human moves in a way that ’s any different from how a robot moves, the method immediately falls apart, ” Choudhury said.

“Our thinking was, ‘Can we find a principled way to deal with this mismatch between how humans and robots do tasks? ’ ”

For example, a RHyME-equipped robot shown a video of a human fetching a mug from the counter and placing it in a nearby sink will comb its bank of videos and draw inspiration from similar actions – like grasping a cup and lowering a utensil.

RHyME paves the way for robots to learn multiple-step sequences while significantly lowering the amount of robot data needed for training, the researchers said.

RHyME requires just 30 minutes of robot data; in a lab setting, robots trained using the system achieved a more than 50 % increase in task success compared to previous methods, the researchers said.

About this AI and robotics research news

Author: Becka Bowyer
Source: Cornell University
Contact: Becka Bowyer – Cornell University
Image: The image is credited to Neuroscience News

Original Research: Closed access.
“One-Shot Imitation under Mismatched Execution ” by Sanjiban Choudhury, et al. arXiv

Abstract

One-Shot Imitation under Mismatched Execution

Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities.

Existing methods for human-robot translation either depend on paired data, which is infeasible to scale, or rely heavily on frame-level visual similarities that often break down in practice.

To address these challenges, we propose RHyME, a novel framework that automatically pairs human and robot trajectories using sequence-level optimal transport cost functions.

Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human videos by retrieving and composing short-horizon human clips. This approach facilitates effective policy training without the need for paired data.

RHyME successfully imitates a range of cross-embodiment demonstrators, both in simulation and with a real human hand, achieving over 50 % increase in task success compared to previous methods.

We release our code and datasets at this https URL.

Share This Post

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Neuroscience Articles

AI Teaches Robots Tasks from a Single How-To Video

About this AI and robotics research news

Share This Post

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

More Citizens Turning to Psilocybin for Self-Treatment

AI Teaches Robots Tasks from a Single How-To Video

Stimulants Linked to Physical Reflections

Do You Want To Boost Your Business?

drop us a line and keep in touch

Get Started

Follow Us