ChatPaper.aiChatPaper

F-HOI:面向细粒度语义对齐的3D人-物交互

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

July 17, 2024
作者: Jie Yang, Xuesong Niu, Nan Jiang, Ruimao Zhang, Siyuan Huang
cs.AI

摘要

现有的3D人体物体交互(HOI)数据集和模型仅仅将全局描述与长HOI序列对齐,缺乏对中间状态和状态之间转换的详细理解。在本文中,我们认为细粒度语义对齐,利用状态级描述,为学习语义丰富的HOI表示提供了一种有前途的范式。为了实现这一目标,我们引入了Semantic-HOI,这是一个新数据集,包括超过20K个配对的HOI状态,每个HOI状态都有细致的描述,以及发生在两个连续状态之间的身体动作。利用提出的数据集,我们设计了三个状态级HOI任务,以实现HOI序列内的细粒度语义对齐。此外,我们提出了一个名为F-HOI的统一模型,旨在利用多模态指令,并赋予多模态大语言模型有效处理各种HOI任务的能力。F-HOI具有多重优势:(1)它采用支持多样多模态输入的统一任务制定。 (2)它在2D、3D和语言空间中保持HOI的一致性。 (3)它利用细粒度文本监督进行直接优化,避免对HOI状态进行复杂建模。大量实验证明,F-HOI有效地将HOI状态与细粒度语义描述对齐,熟练地处理理解、推理、生成和重建任务。
English
Existing 3D human object interaction (HOI) datasets and models simply align global descriptions with the long HOI sequence, while lacking a detailed understanding of intermediate states and the transitions between states. In this paper, we argue that fine-grained semantic alignment, which utilizes state-level descriptions, offers a promising paradigm for learning semantically rich HOI representations. To achieve this, we introduce Semantic-HOI, a new dataset comprising over 20K paired HOI states with fine-grained descriptions for each HOI state and the body movements that happen between two consecutive states. Leveraging the proposed dataset, we design three state-level HOI tasks to accomplish fine-grained semantic alignment within the HOI sequence. Additionally, we propose a unified model called F-HOI, designed to leverage multimodal instructions and empower the Multi-modal Large Language Model to efficiently handle diverse HOI tasks. F-HOI offers multiple advantages: (1) It employs a unified task formulation that supports the use of versatile multimodal inputs. (2) It maintains consistency in HOI across 2D, 3D, and linguistic spaces. (3) It utilizes fine-grained textual supervision for direct optimization, avoiding intricate modeling of HOI states. Extensive experiments reveal that F-HOI effectively aligns HOI states with fine-grained semantic descriptions, adeptly tackling understanding, reasoning, generation, and reconstruction tasks.

Summary

AI-Generated Summary

PDF143November 28, 2024