ChatPaper.aiChatPaper

WHAC:基於世界場景的人類與相機研究

WHAC: World-grounded Humans and Cameras

March 19, 2024
作者: Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang
cs.AI

摘要

從單目影片中,以世界座標系為基準,精確估算人體與相機的運動軌跡及其尺度,是一個極具價值卻又充滿挑戰且難以確定的問題。本研究旨在透過世界、人體與相機三者間的協同作用,共同恢復具表現力的參數化人體模型(即SMPL-X)及相應的相機姿態。我們的方法基於兩個關鍵觀察:首先,基於相機座標系的SMPL-X估算法能有效恢復人體的絕對深度;其次,人體運動本身提供了絕對的空間線索。整合這些洞察,我們提出了一個新框架,稱為WHAC,以促進基於世界座標的表現性人體姿態與形狀估計(EHPS)及相機姿態估計,而無需依賴傳統的優化技術。此外,我們還推出了一個新的合成數據集WHAC-A-Mole,該數據集包含精確註釋的人體與相機,並展示了多樣化的互動人體運動及逼真的相機軌跡。在標準及新建立的基準測試上進行的大量實驗,凸顯了我們框架的優越性與有效性。我們將公開程式碼與數據集。
English
Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our approach is founded on two key observations. Firstly, camera-frame SMPL-X estimation methods readily recover absolute human depth. Secondly, human motions inherently provide absolute spatial cues. By integrating these insights, we introduce a novel framework, referred to as WHAC, to facilitate world-grounded expressive human pose and shape estimation (EHPS) alongside camera pose estimation, without relying on traditional optimization techniques. Additionally, we present a new synthetic dataset, WHAC-A-Mole, which includes accurately annotated humans and cameras, and features diverse interactive human motions as well as realistic camera trajectories. Extensive experiments on both standard and newly established benchmarks highlight the superiority and efficacy of our framework. We will make the code and dataset publicly available.

Summary

AI-Generated Summary

PDF32February 24, 2025