擴展動態人-場景交互建模

摘要

面對數據稀缺和高級運動合成等挑戰，我們介紹了 TRUMANS 數據集以及一種新穎的 HSI 運動合成方法。TRUMANS 是目前最全面的運動捕捉 HSI 數據集，包含超過 15 小時的人類在 100 個室內場景中的互動。它精細地捕捉了全身人體動作和部分對象動態，著重於接觸的真實感。通過將物理環境轉換為精確的虛擬模型，並對人類和對象的外觀和運動進行廣泛增強，同時保持互動的忠實度，進一步擴大了這一數據集。利用 TRUMANS，我們設計了一種基於擴散的自回歸模型，可以高效生成任意長度的 HSI 序列，同時考慮場景背景和預期動作。在實驗中，我們的方法在一系列 3D 場景數據集（例如 PROX、Replica、ScanNet、ScanNet++）上展現了顯著的零樣本泛化能力，生成的動作與原始運動捕捉序列非常接近，經定量實驗和人類研究證實。

English

Confronting the challenges of data scarcity and advanced motion synthesis in human-scene interaction modeling, we introduce the TRUMANS dataset alongside a novel HSI motion synthesis method. TRUMANS stands as the most comprehensive motion-captured HSI dataset currently available, encompassing over 15 hours of human interactions across 100 indoor scenes. It intricately captures whole-body human motions and part-level object dynamics, focusing on the realism of contact. This dataset is further scaled up by transforming physical environments into exact virtual models and applying extensive augmentations to appearance and motion for both humans and objects while maintaining interaction fidelity. Utilizing TRUMANS, we devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length, taking into account both scene context and intended actions. In experiments, our approach shows remarkable zero-shot generalizability on a range of 3D scene datasets (e.g., PROX, Replica, ScanNet, ScanNet++), producing motions that closely mimic original motion-captured sequences, as confirmed by quantitative experiments and human studies.

擴展動態人-場景交互建模

Scaling Up Dynamic Human-Scene Interaction Modeling

摘要

Support