ChatPaper.aiChatPaper

通过随机玩具游戏学习抓取万物

Learning to Grasp Anything by Playing with Random Toys

October 14, 2025
作者: Dantong Niu, Yuvan Sharma, Baifeng Shi, Rachel Ding, Matteo Gioia, Haoru Xue, Henry Tsai, Konstantinos Kallidromitis, Anirudh Pai, Shankar Shastry, Trevor Darrell, Jitendra Malik, Roei Herzig
cs.AI

摘要

机器人操作策略往往难以泛化到新物体上,这限制了其在实际应用中的效用。相比之下,认知科学研究表明,儿童通过掌握少量简单玩具,进而将这种技能应用于更复杂的物品,从而发展出可泛化的灵巧操作能力。受此启发,我们探讨机器人是否也能实现类似的泛化能力。我们的研究结果表明,机器人能够通过仅由四种基本形状(球体、长方体、圆柱体和圆环)随机组合而成的物体,学习到可泛化的抓取技能。我们证明,在这些“玩具”上进行训练,能够使机器人稳健地泛化到现实世界的物体上,展现出强大的零样本性能。关键在于,我们发现这种泛化能力源于我们提出的检测池化机制所诱导的以物体为中心的视觉表征。无论是在仿真环境还是实体机器人上评估,我们的模型在YCB数据集上实现了67%的现实世界抓取成功率,超越了依赖更多领域内数据的现有最先进方法。我们还进一步研究了通过改变训练玩具的数量与多样性以及每个玩具的示范次数,零样本泛化性能如何随之变化。我们相信,这项工作为机器人操作中的可扩展与可泛化学习提供了一条充满前景的路径。演示视频、代码、检查点及我们的数据集均可在项目页面获取:https://lego-grasp.github.io/。
English
Robotic manipulation policies often struggle to generalize to novel objects, limiting their real-world utility. In contrast, cognitive science suggests that children develop generalizable dexterous manipulation skills by mastering a small set of simple toys and then applying that knowledge to more complex items. Inspired by this, we study if similar generalization capabilities can also be achieved by robots. Our results indicate robots can learn generalizable grasping using randomly assembled objects that are composed from just four shape primitives: spheres, cuboids, cylinders, and rings. We show that training on these "toys" enables robust generalization to real-world objects, yielding strong zero-shot performance. Crucially, we find the key to this generalization is an object-centric visual representation induced by our proposed detection pooling mechanism. Evaluated in both simulation and on physical robots, our model achieves a 67% real-world grasping success rate on the YCB dataset, outperforming state-of-the-art approaches that rely on substantially more in-domain data. We further study how zero-shot generalization performance scales by varying the number and diversity of training toys and the demonstrations per toy. We believe this work offers a promising path to scalable and generalizable learning in robotic manipulation. Demonstration videos, code, checkpoints and our dataset are available on our project page: https://lego-grasp.github.io/ .
PDF42October 16, 2025