InstructPart:基於指令推理的任務導向部件分割
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
May 23, 2025
作者: Zifu Wan, Yaqi Xie, Ce Zhang, Zhiqiu Lin, Zihan Wang, Simon Stepputtis, Deva Ramanan, Katia Sycara
cs.AI
摘要
大型多模态基础模型,特别是在语言和视觉领域,已显著推动了包括机器人技术、自动驾驶、信息检索和基础理解在内的多种任务的进展。然而,这些模型中的许多将物体视为不可分割的整体,忽视了构成它们的部件。理解这些部件及其相关的可供性,为物体的功能性提供了宝贵的洞见,这对于执行广泛任务至关重要。在本研究中,我们引入了一个新颖的现实世界基准——InstructPart,它包含了手工标注的部件分割注释和任务导向的指令,用以评估当前模型在日常情境下理解和执行部件级任务的表现。通过我们的实验,我们证明了即使对于最先进的视觉-语言模型(VLMs)而言,任务导向的部件分割仍是一个具有挑战性的问题。除了我们的基准之外,我们还介绍了一个简单的基线模型,通过使用我们的数据集进行微调,实现了性能的成倍提升。借助我们的数据集和基准,我们旨在促进任务导向部件分割的研究,并增强VLMs在机器人技术、虚拟现实、信息检索及其他相关领域的适用性。项目网站:https://zifuwan.github.io/InstructPart/。
English
Large multimodal foundation models, particularly in the domains of language
and vision, have significantly advanced various tasks, including robotics,
autonomous driving, information retrieval, and grounding. However, many of
these models perceive objects as indivisible, overlooking the components that
constitute them. Understanding these components and their associated
affordances provides valuable insights into an object's functionality, which is
fundamental for performing a wide range of tasks. In this work, we introduce a
novel real-world benchmark, InstructPart, comprising hand-labeled part
segmentation annotations and task-oriented instructions to evaluate the
performance of current models in understanding and executing part-level tasks
within everyday contexts. Through our experiments, we demonstrate that
task-oriented part segmentation remains a challenging problem, even for
state-of-the-art Vision-Language Models (VLMs). In addition to our benchmark,
we introduce a simple baseline that achieves a twofold performance improvement
through fine-tuning with our dataset. With our dataset and benchmark, we aim to
facilitate research on task-oriented part segmentation and enhance the
applicability of VLMs across various domains, including robotics, virtual
reality, information retrieval, and other related fields. Project website:
https://zifuwan.github.io/InstructPart/.Summary
AI-Generated Summary