Pantheon360: 通过3D感知的360度视频扩散驾驭数字孪生生成

摘要

从视频生成完整数字孪生需要精确的相机控制、全局场景覆盖以及严格的时空一致性约束，而这些对于视角有限的透视视频生成器仍具挑战性。其狭窄视场角迫使采用长轨迹或多视角路径，从而加剧跨视角不一致性与时间漂移。我们认为360°视频生成提供了一种自然解决方案：全景覆盖简化了轨迹设计，并为维持连贯性提供了强大的全局上下文。我们提出Pantheon360：通过3D感知的360°视频扩散驾驭数字孪生生成——一个可控的360°视频生成框架，能够从稀疏的360°输入合成高保真视频。其核心思想是显式3D缓存（由输入重建而来），该缓存作为任意用户定义相机路径的几何支架。这使得扩散模型能够专注于照片级纹理细化，同时3D缓存保障全局几何一致性。实验表明，Pantheon360实现了卓越的视觉质量和无与伦比的几何连贯性，为下游仿真和数字孪生应用提供了可靠且灵活的360°场景生成能力。

English

Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.