RoboCurate：利用动作验证神经轨迹的多样性促进机器人学习

摘要

由视频生成模型合成的数据作为可扩展流水线，在机器人学习领域展现出潜力，但由于生成视频的不完美性，常存在动作质量不一致的问题。近期研究尝试利用视觉语言模型验证视频质量，但这类模型难以准确区分物理合理性，且无法直接评估生成动作本身。为解决此问题，我们提出RoboCurate——一种创新的合成机器人数据生成框架，通过将标注动作与仿真回放进行比对来评估和筛选动作质量。具体而言，该框架在仿真器中重放预测动作，并通过比较仿真推演与生成视频之间的运动一致性来评估动作质量。此外，我们通过图像到图像编辑技术突破现有数据集的观测多样性限制，并应用动作保持型视频到视频转换技术进一步增强外观多样性。实验表明，与仅使用真实数据相比，RoboCurate生成的数据带来显著的成功率提升：在GR-1桌面任务（300次演示）中相对提升70.1%，在预训练设置的DexMimicGen任务中提升16.1%，在具有挑战性的真实世界ALLEX仿人灵巧操作场景中实现179.9%的大幅提升。

English

Synthetic data generated by video generative models has shown promise for robot learning as a scalable pipeline, but it often suffers from inconsistent action quality due to imperfectly generated videos. Recently, vision-language models (VLMs) have been leveraged to validate video quality, but they have limitations in distinguishing physically accurate videos and, even then, cannot directly evaluate the generated actions themselves. To tackle this issue, we introduce RoboCurate, a novel synthetic robot data generation framework that evaluates and filters the quality of annotated actions by comparing them with simulation replay. Specifically, RoboCurate replays the predicted actions in a simulator and assesses action quality by measuring the consistency of motion between the simulator rollout and the generated video. In addition, we unlock observation diversity beyond the available dataset via image-to-image editing and apply action-preserving video-to-video transfer to further augment appearance. We observe RoboCurate's generated data yield substantial relative improvements in success rates compared to using real data only, achieving +70.1% on GR-1 Tabletop (300 demos), +16.1% on DexMimicGen in the pre-training setup, and +179.9% in the challenging real-world ALLEX humanoid dexterous manipulation setting.

RoboCurate：利用动作验证神经轨迹的多样性促进机器人学习

RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning

摘要

Support