CubeComposer:基于透视视频的时空自回归4K 360°视频生成
CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video
March 4, 2026
作者: Lingen Li, Guangzhi Wang, Xiaoyu Li, Zhaoyang Zhang, Qi Dou, Jinwei Gu, Tianfan Xue, Ying Shan
cs.AI
摘要
从平面视角输入生成高质量360°全景视频是虚拟现实(VR)的关键应用之一,其中高分辨率视频对沉浸式体验尤为重要。现有方法受限于基础扩散模型的计算能力,仅支持≤1K分辨率的原生生成,并需依赖次优的后处理超分技术提升分辨率。我们提出CubeComposer——一种新颖的时空自回归扩散模型,能够原生生成4K分辨率的360°视频。通过将视频分解为六面立方体贴图表示,CubeComposer按照精心规划的时空顺序自回归合成内容,在降低内存需求的同时实现高分辨率输出。针对多维自回归的挑战,我们提出:(1)跨立方体面与时窗协同的时空自回归策略,确保合成连贯性;(2)配备稀疏上下文注意力设计的立方体面上下文管理机制以提升效率;(3)连续性感知技术,包括立方体感知位置编码、填充与融合算法以消除边界接缝。在基准数据集上的大量实验表明,CubeComposer在原生成分辨率和视觉质量上均优于现有最优方法,可支撑实际VR应用场景。项目页面:https://lg-li.github.io/project/cubecomposer
English
Generating high-quality 360° panoramic videos from perspective input is one of the crucial applications for virtual reality (VR), whereby high-resolution videos are especially important for immersive experience. Existing methods are constrained by computational limitations of vanilla diffusion models, only supporting leq 1K resolution native generation and relying on suboptimal post super-resolution to increase resolution. We introduce CubeComposer, a novel spatio-temporal autoregressive diffusion model that natively generates 4K-resolution 360° videos. By decomposing videos into cubemap representations with six faces, CubeComposer autoregressively synthesizes content in a well-planned spatio-temporal order, reducing memory demands while enabling high-resolution output. Specifically, to address challenges in multi-dimensional autoregression, we propose: (1) a spatio-temporal autoregressive strategy that orchestrates 360° video generation across cube faces and time windows for coherent synthesis; (2) a cube face context management mechanism, equipped with a sparse context attention design to improve efficiency; and (3) continuity-aware techniques, including cube-aware positional encoding, padding, and blending to eliminate boundary seams. Extensive experiments on benchmark datasets demonstrate that CubeComposer outperforms state-of-the-art methods in native resolution and visual quality, supporting practical VR application scenarios. Project page: https://lg-li.github.io/project/cubecomposer