从文本指令中实现角色-场景自主交互合成
Autonomous Character-Scene Interaction Synthesis from Text Instruction
October 4, 2024
作者: Nan Jiang, Zimo He, Zi Wang, Hongjie Li, Yixin Chen, Siyuan Huang, Yixin Zhu
cs.AI
摘要
在3D环境中合成人类动作,尤其是那些涉及复杂活动如行走、伸手以及人-物体交互的动作,对用户定义的航点和阶段转换提出了重大要求。这些需求对当前模型构成挑战,导致了自简单人类输入自动化角色动画存在显著差距。本文通过引入一个全面的框架,直接从单一文本指令和目标位置合成多阶段场景感知交互动作来解决这一挑战。我们的方法采用自回归扩散模型来合成下一个动作片段,同时使用一个自主调度器来预测每个动作阶段的过渡。为了确保合成的动作能够无缝融入环境中,我们提出了一个考虑起始和目标位置的局部感知的场景表示。我们通过将帧嵌入与语言输入相结合,进一步增强了生成动作的连贯性。此外,为支持模型训练,我们提出了一个包含120个室内场景中16小时运动序列的全面动作捕捉数据集,涵盖40种类型的动作,每个动作都用精确的语言描述进行了注释。实验结果表明,我们的方法在生成与环境和文本条件密切相关的高质量多阶段动作方面的有效性。
English
Synthesizing human motions in 3D environments, particularly those with
complex activities such as locomotion, hand-reaching, and human-object
interaction, presents substantial demands for user-defined waypoints and stage
transitions. These requirements pose challenges for current models, leading to
a notable gap in automating the animation of characters from simple human
inputs. This paper addresses this challenge by introducing a comprehensive
framework for synthesizing multi-stage scene-aware interaction motions directly
from a single text instruction and goal location. Our approach employs an
auto-regressive diffusion model to synthesize the next motion segment, along
with an autonomous scheduler predicting the transition for each action stage.
To ensure that the synthesized motions are seamlessly integrated within the
environment, we propose a scene representation that considers the local
perception both at the start and the goal location. We further enhance the
coherence of the generated motion by integrating frame embeddings with language
input. Additionally, to support model training, we present a comprehensive
motion-captured dataset comprising 16 hours of motion sequences in 120 indoor
scenes covering 40 types of motions, each annotated with precise language
descriptions. Experimental results demonstrate the efficacy of our method in
generating high-quality, multi-stage motions closely aligned with environmental
and textual conditions.Summary
AI-Generated Summary