UniAff：用于工具使用和视觉语言模型表达能力的统一表示

摘要

以往关于机器人操纵的研究基于对基础三维运动约束和可利用性的有限理解。为了解决这些挑战，我们提出了一个全面的范式，称为UniAff，它将三维物体为中心的操纵和任务理解集成到统一的公式中。具体而言，我们构建了一个带有操纵相关关键属性标签的数据集，包括来自19个类别的900个关节物体和来自12个类别的600个工具。此外，我们利用MLLMs来推断用于操纵任务的物体为中心的表示，包括可利用性识别和对三维运动约束的推理。在模拟和现实世界环境中进行的全面实验表明，UniAff显著改善了对工具和关节物体的机器人操纵的泛化能力。我们希望UniAff将成为未来统一机器人操纵任务的通用基准。图像、视频、数据集和代码已发布在项目网站上：https://sites.google.com/view/uni-aff/home

English

Previous studies on robotic manipulation are based on a limited understanding of the underlying 3D motion constraints and affordances. To address these challenges, we propose a comprehensive paradigm, termed UniAff, that integrates 3D object-centric manipulation and task understanding in a unified formulation. Specifically, we constructed a dataset labeled with manipulation-related key attributes, comprising 900 articulated objects from 19 categories and 600 tools from 12 categories. Furthermore, we leverage MLLMs to infer object-centric representations for manipulation tasks, including affordance recognition and reasoning about 3D motion constraints. Comprehensive experiments in both simulation and real-world settings indicate that UniAff significantly improves the generalization of robotic manipulation for tools and articulated objects. We hope that UniAff will serve as a general baseline for unified robotic manipulation tasks in the future. Images, videos, dataset, and code are published on the project website at:https://sites.google.com/view/uni-aff/home

UniAff：用于工具使用和视觉语言模型表达能力的统一表示

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

摘要

Support