CubeComposer:基於透視影片的時空自回歸4K 360°影片生成
CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video
March 4, 2026
作者: Lingen Li, Guangzhi Wang, Xiaoyu Li, Zhaoyang Zhang, Qi Dou, Jinwei Gu, Tianfan Xue, Ying Shan
cs.AI
摘要
從透視輸入生成高品質360°全景影片是虛擬實境(VR)的關鍵應用之一,其中高解析度影片對沉浸式體驗尤為重要。現有方法受限於基礎擴散模型的計算能力,僅支援原生生成≤1K解析度的影片,需依賴次優的後處理超解析度技術來提升解析度。我們提出CubeComposer——一種新穎的時空自回歸擴散模型,能夠原生生成4K解析度的360°影片。透過將影片分解為六個面的立方體映射表示,CubeComposer按精心規劃的時空順序自回歸合成內容,在降低記憶體需求的同時實現高解析度輸出。針對多維度自回歸的挑戰,我們提出:(1) 時空自回歸策略,協調立方體面與時間窗口的生成以確保連貫性;(2) 立方體面上下文管理機制,配備稀疏上下文注意力設計以提升效率;(3) 連續性感知技術,包括立方體感知位置編碼、填充與融合來消除邊界接縫。在基準數據集上的大量實驗表明,CubeComposer在原生解析度與視覺品質上均超越現有最先進方法,能有效支援實際VR應用場景。項目頁面:https://lg-li.github.io/project/cubecomposer
English
Generating high-quality 360° panoramic videos from perspective input is one of the crucial applications for virtual reality (VR), whereby high-resolution videos are especially important for immersive experience. Existing methods are constrained by computational limitations of vanilla diffusion models, only supporting leq 1K resolution native generation and relying on suboptimal post super-resolution to increase resolution. We introduce CubeComposer, a novel spatio-temporal autoregressive diffusion model that natively generates 4K-resolution 360° videos. By decomposing videos into cubemap representations with six faces, CubeComposer autoregressively synthesizes content in a well-planned spatio-temporal order, reducing memory demands while enabling high-resolution output. Specifically, to address challenges in multi-dimensional autoregression, we propose: (1) a spatio-temporal autoregressive strategy that orchestrates 360° video generation across cube faces and time windows for coherent synthesis; (2) a cube face context management mechanism, equipped with a sparse context attention design to improve efficiency; and (3) continuity-aware techniques, including cube-aware positional encoding, padding, and blending to eliminate boundary seams. Extensive experiments on benchmark datasets demonstrate that CubeComposer outperforms state-of-the-art methods in native resolution and visual quality, supporting practical VR application scenarios. Project page: https://lg-li.github.io/project/cubecomposer