RoboCurate：利用动作验证神经轨迹的多样性赋能机器人学习

摘要

视频生成模型合成的数据作为可扩展流水线为机器人学习展示了潜力，但由于生成视频的不完美，常存在动作质量不一致的问题。近期，视觉语言模型被用于验证视频质量，但其在区分物理准确性视频方面存在局限，且无法直接评估生成动作本身。为解决该问题，我们提出RoboCurate——一种通过仿真回放比对来评估筛选标注动作质量的新型机器人合成数据生成框架。具体而言，RoboCurate在仿真器中重放预测动作，并通过比较仿真推演与生成视频间的运动一致性来评估动作质量。此外，我们通过图像到图像编辑技术突破现有数据集的观测多样性限制，并应用动作保持型视频到视频转换以进一步增强外观多样性。实验表明，与仅使用真实数据相比，RoboCurate生成的数据在成功率上实现显著相对提升：在GR-1桌面任务（300次演示）中提升70.1%，在预训练设置的DexMimicGen中提升16.1%，在极具挑战性的真实世界ALLEX仿人灵巧操作场景中提升179.9%。

English

Synthetic data generated by video generative models has shown promise for robot learning as a scalable pipeline, but it often suffers from inconsistent action quality due to imperfectly generated videos. Recently, vision-language models (VLMs) have been leveraged to validate video quality, but they have limitations in distinguishing physically accurate videos and, even then, cannot directly evaluate the generated actions themselves. To tackle this issue, we introduce RoboCurate, a novel synthetic robot data generation framework that evaluates and filters the quality of annotated actions by comparing them with simulation replay. Specifically, RoboCurate replays the predicted actions in a simulator and assesses action quality by measuring the consistency of motion between the simulator rollout and the generated video. In addition, we unlock observation diversity beyond the available dataset via image-to-image editing and apply action-preserving video-to-video transfer to further augment appearance. We observe RoboCurate's generated data yield substantial relative improvements in success rates compared to using real data only, achieving +70.1% on GR-1 Tabletop (300 demos), +16.1% on DexMimicGen in the pre-training setup, and +179.9% in the challenging real-world ALLEX humanoid dexterous manipulation setting.

RoboCurate：利用动作验证神经轨迹的多样性赋能机器人学习

RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning

摘要

Support