InstructPart: 指示推論によるタスク指向パーツセグメンテーション

要旨

大規模マルチモーダル基盤モデル、特に言語と視覚の領域において、ロボティクス、自動運転、情報検索、グラウンディングなど様々なタスクを大幅に進歩させてきました。しかし、これらのモデルの多くは物体を分割不可能なものとして認識し、それを構成する部品を見落としています。これらの部品とそれに関連するアフォーダンスを理解することは、物体の機能性に関する貴重な洞察を提供し、幅広いタスクを実行する上で基本的なものです。本研究では、手動でラベル付けされた部品セグメンテーションのアノテーションとタスク指向の指示を含む新しい実世界ベンチマーク「InstructPart」を導入し、日常的な文脈における部品レベルのタスクの理解と実行において、現在のモデルの性能を評価します。実験を通じて、タスク指向の部品セグメンテーションが、最先端の視覚言語モデル（VLM）にとっても依然として難しい問題であることを示します。ベンチマークに加えて、私たちのデータセットを用いたファインチューニングにより性能が2倍向上するシンプルなベースラインを紹介します。私たちのデータセットとベンチマークを通じて、タスク指向の部品セグメンテーションに関する研究を促進し、ロボティクス、仮想現実、情報検索、その他関連分野におけるVLMの適用性を高めることを目指しています。プロジェクトウェブサイト: https://zifuwan.github.io/InstructPart/。

English

Large multimodal foundation models, particularly in the domains of language and vision, have significantly advanced various tasks, including robotics, autonomous driving, information retrieval, and grounding. However, many of these models perceive objects as indivisible, overlooking the components that constitute them. Understanding these components and their associated affordances provides valuable insights into an object's functionality, which is fundamental for performing a wide range of tasks. In this work, we introduce a novel real-world benchmark, InstructPart, comprising hand-labeled part segmentation annotations and task-oriented instructions to evaluate the performance of current models in understanding and executing part-level tasks within everyday contexts. Through our experiments, we demonstrate that task-oriented part segmentation remains a challenging problem, even for state-of-the-art Vision-Language Models (VLMs). In addition to our benchmark, we introduce a simple baseline that achieves a twofold performance improvement through fine-tuning with our dataset. With our dataset and benchmark, we aim to facilitate research on task-oriented part segmentation and enhance the applicability of VLMs across various domains, including robotics, virtual reality, information retrieval, and other related fields. Project website: https://zifuwan.github.io/InstructPart/.

InstructPart: 指示推論によるタスク指向パーツセグメンテーション

InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning

要旨

Support