UniAff:用于工具使用和视觉语言模型表达能力的统一表示
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
September 30, 2024
作者: Qiaojun Yu, Siyuan Huang, Xibin Yuan, Zhengkai Jiang, Ce Hao, Xin Li, Haonan Chang, Junbo Wang, Liu Liu, Hongsheng Li, Peng Gao, Cewu Lu
cs.AI
摘要
以往关于机器人操纵的研究基于对基础三维运动约束和可利用性的有限理解。为了解决这些挑战,我们提出了一个全面的范式,称为UniAff,它将三维物体为中心的操纵和任务理解集成到统一的公式中。具体而言,我们构建了一个带有操纵相关关键属性标签的数据集,包括来自19个类别的900个关节物体和来自12个类别的600个工具。此外,我们利用MLLMs来推断用于操纵任务的物体为中心的表示,包括可利用性识别和对三维运动约束的推理。在模拟和现实世界环境中进行的全面实验表明,UniAff显著改善了对工具和关节物体的机器人操纵的泛化能力。我们希望UniAff将成为未来统一机器人操纵任务的通用基准。图像、视频、数据集和代码已发布在项目网站上:https://sites.google.com/view/uni-aff/home
English
Previous studies on robotic manipulation are based on a limited understanding
of the underlying 3D motion constraints and affordances. To address these
challenges, we propose a comprehensive paradigm, termed UniAff, that integrates
3D object-centric manipulation and task understanding in a unified formulation.
Specifically, we constructed a dataset labeled with manipulation-related key
attributes, comprising 900 articulated objects from 19 categories and 600 tools
from 12 categories. Furthermore, we leverage MLLMs to infer object-centric
representations for manipulation tasks, including affordance recognition and
reasoning about 3D motion constraints. Comprehensive experiments in both
simulation and real-world settings indicate that UniAff significantly improves
the generalization of robotic manipulation for tools and articulated objects.
We hope that UniAff will serve as a general baseline for unified robotic
manipulation tasks in the future. Images, videos, dataset, and code are
published on the project website at:https://sites.google.com/view/uni-aff/homeSummary
AI-Generated Summary