ChatPaper.aiChatPaper

EgoX:基于单视角外中心视频的自我中心视频生成

EgoX: Egocentric Video Generation from a Single Exocentric Video

December 9, 2025
作者: Taewoong Kang, Kinam Kim, Dohyeon Kim, Minho Park, Junha Hyung, Jaegul Choo
cs.AI

摘要

以自我为中心的感知使人类能够从自身视角直接体验和理解世界。将外中心(第三人称)视频转换为自我中心(第一人称)视频为沉浸式理解开辟了新途径,但由于极端相机位姿变化和最小视角重叠,该任务仍极具挑战性。这项任务需要在保持几何一致性的前提下,忠实保留可见内容并合成未观测区域。为此,我们提出EgoX——一种从单段外中心输入生成自我中心视频的创新框架。EgoX通过轻量级LoRA适配器利用大规模视频扩散模型的预训练时空知识,并采用宽度与通道维度拼接的统一条件策略,融合外中心与自我中心先验信息。此外,几何引导的自注意力机制能选择性关注空间相关区域,确保几何连贯性与高视觉保真度。我们的方法在实现连贯逼真的自我中心视频生成的同时,对未见过的真实场景视频展现出强大的可扩展性和鲁棒性。
English
Egocentric perception enables humans to experience and understand the world directly from their own point of view. Translating exocentric (third-person) videos into egocentric (first-person) videos opens up new possibilities for immersive understanding but remains highly challenging due to extreme camera pose variations and minimal view overlap. This task requires faithfully preserving visible content while synthesizing unseen regions in a geometrically consistent manner. To achieve this, we present EgoX, a novel framework for generating egocentric videos from a single exocentric input. EgoX leverages the pretrained spatio temporal knowledge of large-scale video diffusion models through lightweight LoRA adaptation and introduces a unified conditioning strategy that combines exocentric and egocentric priors via width and channel wise concatenation. Additionally, a geometry-guided self-attention mechanism selectively attends to spatially relevant regions, ensuring geometric coherence and high visual fidelity. Our approach achieves coherent and realistic egocentric video generation while demonstrating strong scalability and robustness across unseen and in-the-wild videos.
PDF852December 17, 2025