ChatPaper.aiChatPaper

LongVie 2:多模態可控超長影片世界模型

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

December 15, 2025
作者: Jianxiong Gao, Zhaoxi Chen, Xian Liu, Junhao Zhuang, Chengming Xu, Jianfeng Feng, Yu Qiao, Yanwei Fu, Chenyang Si, Ziwei Liu
cs.AI

摘要

基於預訓練影片生成系統建構影片世界模型,是實現通用時空智能的重要挑戰。一個理想的世界模型應具備三項關鍵特性:可控性、長期視覺品質與時間一致性。為此,我們採用漸進式策略——先強化可控性,再延伸至長期高品質生成。我們提出LongVie 2,這是一個端到端自回歸框架,包含三階段訓練:(1) 多模態引導技術,融合稠密與稀疏控制信號以提供隱式世界級監督,提升可控性;(2) 針對輸入幀的退化感知訓練,彌合訓練與長期推斷間的差距以維持高視覺品質;(3) 歷史上下文引導機制,通過對齊相鄰片段間的上下文信息確保時間連貫性。我們進一步推出LongVGenBench基準數據集,包含100段涵蓋真實與合成場景的高解析度一分鐘影片。大量實驗表明,LongVie 2在長程可控性、時間連貫性與視覺保真度方面達到頂尖水平,並支持持續生成長達五分鐘的影片,為統一影片世界建模邁出重要一步。
English
Building video world models upon pretrained video generation systems represents an important yet challenging step toward general spatiotemporal intelligence. A world model should possess three essential properties: controllability, long-term visual quality, and temporal consistency. To this end, we take a progressive approach-first enhancing controllability and then extending toward long-term, high-quality generation. We present LongVie 2, an end-to-end autoregressive framework trained in three stages: (1) Multi-modal guidance, which integrates dense and sparse control signals to provide implicit world-level supervision and improve controllability; (2) Degradation-aware training on the input frame, bridging the gap between training and long-term inference to maintain high visual quality; and (3) History-context guidance, which aligns contextual information across adjacent clips to ensure temporal consistency. We further introduce LongVGenBench, a comprehensive benchmark comprising 100 high-resolution one-minute videos covering diverse real-world and synthetic environments. Extensive experiments demonstrate that LongVie 2 achieves state-of-the-art performance in long-range controllability, temporal coherence, and visual fidelity, and supports continuous video generation lasting up to five minutes, marking a significant step toward unified video world modeling.
PDF572December 17, 2025