扩展动态人-场景交互建模

摘要

面对数据稀缺和高级动作合成在人-场景交互建模中带来的挑战，我们介绍了TRUMANS数据集以及一种新颖的HSI动作合成方法。TRUMANS是目前最全面的动作捕捉HSI数据集，涵盖了超过15小时的人类在100个室内场景中的互动。它精细地捕捉了全身人类动作和部分对象动态，侧重于接触的真实性。该数据集通过将物理环境转化为精确的虚拟模型，并对人类和对象的外观和动作进行广泛增强，同时保持交互的忠实度，进一步扩大了规模。利用TRUMANS，我们设计了一种基于扩散的自回归模型，可以高效生成任意长度的HSI序列，考虑了场景背景和预期动作。在实验中，我们的方法在一系列3D场景数据集（例如PROX、Replica、ScanNet、ScanNet++）上展现出显著的零样本泛化能力，生成的动作与原始动作捕捉序列密切相似，经由定量实验和人类研究证实。

English

Confronting the challenges of data scarcity and advanced motion synthesis in human-scene interaction modeling, we introduce the TRUMANS dataset alongside a novel HSI motion synthesis method. TRUMANS stands as the most comprehensive motion-captured HSI dataset currently available, encompassing over 15 hours of human interactions across 100 indoor scenes. It intricately captures whole-body human motions and part-level object dynamics, focusing on the realism of contact. This dataset is further scaled up by transforming physical environments into exact virtual models and applying extensive augmentations to appearance and motion for both humans and objects while maintaining interaction fidelity. Utilizing TRUMANS, we devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length, taking into account both scene context and intended actions. In experiments, our approach shows remarkable zero-shot generalizability on a range of 3D scene datasets (e.g., PROX, Replica, ScanNet, ScanNet++), producing motions that closely mimic original motion-captured sequences, as confirmed by quantitative experiments and human studies.

扩展动态人-场景交互建模

Scaling Up Dynamic Human-Scene Interaction Modeling

摘要

Support