ChatPaper.aiChatPaper

重新审视机器人操作中的铰接部件感知

Revisiting Articulated Parts Perception in Robot Manipulation

June 6, 2026
作者: Xiaoqian Wu, Yejie Guo, Xiaoyang Chen, Lixin Yang, Cewu Lu, Yong-Lu Li
cs.AI

摘要

我们周围充满了各种带有可活动关节部件的物体,例如箱子、把手、门。对可活动部件进行准确且可泛化的感知,对于提升机器人操作能力至关重要。基于这一需求,近期在可活动部件感知方面的研究主要沿两个方向展开:一类工作采用基于位姿的表示方法,但需要高昂的人工标注成本;另一类则利用基于可供性的方法,通过点跟踪预测物体未来运动,无需额外人工标注,但受限于数据质量较低。本文提出一种新的可活动部件表示方法——几何主结构(GPS),该表示抽象了部件的几何结构,以平衡可扩展性与质量。为实现高效且可扩展的数据采集,GPS 与便携式虚拟现实(VR)设备集成,标注一个物体序列仅需一分钟。这种直接人工标注的质量优于估计的可供性。通过高效的 VR-GPS 系统,我们收集了涵盖六类部件、234个物体的 41K 帧数据,并训练了一个仅以单张 RGB-D 物体图像为输入的可泛化 GPS 模型。针对物体操作,我们基于 GPS 预测部署了启发式策略。无需任何领域内微调,我们的方法在 9 个物体的 270 种初始状态下达到了 73% 的成功率。我们的代码、数据和可复用工具已开源至 https://enlighten0707.github.io/gps。
English
We are surrounded by various objects with movable, articulated parts, e.g., box, handle, door. An accurate and generalizable perception of articulated parts is essential to enhance robotic manipulation capabilities. Building on this need, recent efforts in articulated parts perception have followed two main directions: One line of work uses pose-based representation, which requires high manual cost; in parallel, affordance-based methods extract future object motion from point tracking without additional manual efforts, but suffer from low-quality data. In this paper, we propose a new representation of articulated parts, Geometric Primary Structure (GPS), an abstraction of the part geometry structure to balance scalability and quality. For efficient and scalable data collection, GPS is integrated with a portable Virtual Reality (VR) device and requires only one minute to annotate one object sequence. This direct human annotation provides higher quality than the estimated affordance. With this efficient VR-GPS system, we collect 41K frames for 234 objects across six part classes, and train a generalizable GPS model with a single RGB-D object image as input. For object manipulation, we deploy a heuristic policy based on GPS prediction. Without any in-domain fine-tuning, our method achieves an 73% success rate, covering 270 initial states for 9 objects. Our code, data and reusable tool are available at https://enlighten0707.github.io/gps.