Pantheon360：經由3D感知360度影片擴散馴服數位孿生生成

摘要

從影片生成完整的數位孿生需要精確的攝影機控制、全面的場景覆蓋以及嚴格的時空一致性約束，但由於透視影片生成器的視野有限，這些要求仍具挑戰性。狹窄的視野迫使採用長軌跡或多視角軌跡，從而加劇了跨視角不一致性與時間漂移。我們認為，360° 影片生成提供了一個自然的解決方案：全景覆蓋簡化了軌跡設計，並提供強大的全局背景以維持一致性。我們提出 Pantheon360：透過具 3D 感知的 360° 影片擴散模型馴服數位孿生生成，這是一個可控的 360° 影片生成框架，能從稀疏的 360° 輸入合成高保真影片。其核心思想是從輸入重建的顯式 3D 快取，作為任何使用者定義攝影機路徑的幾何骨架。這使得擴散模型能夠專注於寫實紋理細化，同時 3D 快取確保了全局幾何一致性。實驗顯示，Pantheon360 實現了卓越的視覺品質與無與倫比的幾何一致性，為後續模擬與數位孿生應用提供了可靠且靈活的 360° 場景生成能力。

English

Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.