UniEgoMotion:一种统一模型,用于自我中心运动重建、预测与生成
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation
August 2, 2025
作者: Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi, Kazuki Kozuka, Juan Carlos Niebles, Ehsan Adeli
cs.AI
摘要
以场景为背景的自我中心人体运动生成与预测对于增强AR/VR体验、优化人机交互、推动辅助技术进步以及实现适应性医疗解决方案至关重要,它能够从第一人称视角精确预测和模拟人体运动。然而,现有方法主要集中于结合结构化三维场景的第三人称运动合成,在现实世界的自我中心场景中效果受限,因为有限的视野、频繁的遮挡以及动态相机视角阻碍了场景感知。为填补这一空白,我们提出了自我中心运动生成与自我中心运动预测两项新任务,它们利用第一人称图像进行场景感知的运动合成,而无需依赖显式的三维场景。我们提出了UniEgoMotion,一个统一的基于条件的运动扩散模型,采用专为自我中心设备设计的头部中心运动表示。UniEgoMotion简洁而高效的设计支持在统一框架下从第一人称视觉输入进行自我中心运动重建、预测和生成。与以往忽视场景语义的工作不同,我们的模型有效提取基于图像的场景上下文,以推断合理的三维运动。为促进训练,我们引入了EE4D-Motion,一个源自EgoExo4D的大规模数据集,并增加了伪真实三维运动标注。UniEgoMotion在自我中心运动重建上达到了最先进的性能,并首次实现了从单张自我中心图像生成运动。广泛的评估验证了我们统一框架的有效性,为自我中心运动建模设立了新基准,开启了自我中心应用的新可能。
English
Egocentric human motion generation and forecasting with scene-context is
crucial for enhancing AR/VR experiences, improving human-robot interaction,
advancing assistive technologies, and enabling adaptive healthcare solutions by
accurately predicting and simulating movement from a first-person perspective.
However, existing methods primarily focus on third-person motion synthesis with
structured 3D scene contexts, limiting their effectiveness in real-world
egocentric settings where limited field of view, frequent occlusions, and
dynamic cameras hinder scene perception. To bridge this gap, we introduce
Egocentric Motion Generation and Egocentric Motion Forecasting, two novel tasks
that utilize first-person images for scene-aware motion synthesis without
relying on explicit 3D scene. We propose UniEgoMotion, a unified conditional
motion diffusion model with a novel head-centric motion representation tailored
for egocentric devices. UniEgoMotion's simple yet effective design supports
egocentric motion reconstruction, forecasting, and generation from first-person
visual inputs in a unified framework. Unlike previous works that overlook scene
semantics, our model effectively extracts image-based scene context to infer
plausible 3D motion. To facilitate training, we introduce EE4D-Motion, a
large-scale dataset derived from EgoExo4D, augmented with pseudo-ground-truth
3D motion annotations. UniEgoMotion achieves state-of-the-art performance in
egocentric motion reconstruction and is the first to generate motion from a
single egocentric image. Extensive evaluations demonstrate the effectiveness of
our unified framework, setting a new benchmark for egocentric motion modeling
and unlocking new possibilities for egocentric applications.