OmniRoam:基于长视角全景视频生成的世界漫游技术
OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation
March 31, 2026
作者: Yuheng Liu, Xin Lin, Xinke Li, Baihan Yang, Chen Wang, Kalyan Sunkavalli, Yannick Hold-Geoffroy, Hao Tan, Kai Zhang, Xiaohui Xie, Zifan Shi, Yiwei Hu
cs.AI
摘要
近年来,基于视频生成模型的场景建模技术逐渐成为研究热点。然而现有方法大多依赖透视视频模型,仅能合成场景的有限观测视角,导致存在完整性与全局一致性问题。我们提出OmniRoam——一个可控全景视频生成框架,该框架通过利用全景表征所具备的每帧场景覆盖范围广、天然保持长期时空一致性的优势,实现长时序场景漫游。该框架首先在预览阶段通过轨迹控制视频生成模型,根据输入图像或视频快速生成场景概览;随后在优化阶段对视频进行时序扩展与空间超分处理,生成长时程高分辨率视频,从而实现高保真度的虚拟世界漫游。为训练模型,我们构建了两个包含合成视频与实拍视频的全景视频数据集。实验表明,本框架在视觉质量、可控性与长期场景一致性方面均定性定量地超越现有先进方法。我们还展示了该框架的实时视频生成与三维重建等扩展应用。代码已开源于https://github.com/yuhengliu02/OmniRoam。
English
Modeling scenes using video generation models has garnered growing research interest in recent years. However, most existing approaches rely on perspective video models that synthesize only limited observations of a scene, leading to issues of completeness and global consistency. We propose OmniRoam, a controllable panoramic video generation framework that exploits the rich per-frame scene coverage and inherent long-term spatial and temporal consistency of panoramic representation, enabling long-horizon scene wandering. Our framework begins with a preview stage, where a trajectory-controlled video generation model creates a quick overview of the scene from a given input image or video. Then, in the refine stage, this video is temporally extended and spatially upsampled to produce long-range, high-resolution videos, thus enabling high-fidelity world wandering. To train our model, we introduce two panoramic video datasets that incorporate both synthetic and real-world captured videos. Experiments show that our framework consistently outperforms state-of-the-art methods in terms of visual quality, controllability, and long-term scene consistency, both qualitatively and quantitatively. We further showcase several extensions of this framework, including real-time video generation and 3D reconstruction. Code is available at https://github.com/yuhengliu02/OmniRoam.