DreamWorld:影片生成中的統一世界建模
DreamWorld: Unified World Modeling in Video Generation
February 28, 2026
作者: Boming Tan, Xiangdong Zhang, Ning Liao, Yuqing Zhang, Shaofeng Zhang, Xue Yang, Qi Fan, Yanyong Zhang
cs.AI
摘要
儘管影片生成技術已取得顯著進展,現有模型仍侷限於表層合理性,缺乏對世界連貫且統一的認知。先前方法通常僅納入單一形式的世界相關知識,或依賴僵硬的對齊策略來引入額外知識。然而,單一世界知識的對齊不足以構建需要聯合建模多個異構維度(如物理常識、3D與時間一致性)的世界模型。為解決此侷限性,我們提出DreamWorld——一個透過聯合世界建模範式將互補性世界知識整合到影片生成器的統一框架,通過聯合預測基礎模型中的影片像素與特徵來捕捉時間動態、空間幾何與語義一致性。但直接優化這些異構目標會導致視覺不穩定與時間閃爍問題。為此,我們提出一致性約束退火法,在訓練期間漸進調控世界級約束,並透過多源內部引導在推理階段強化已學習的世界先驗。大量實驗表明,DreamWorld顯著提升世界一致性,在VBench基準上以2.26分優勢超越Wan2.1。程式碼將公開於https://github.com/ABU121111/DreamWorld{mypink{Github}}。
English
Despite impressive progress in video generation, existing models remain limited to surface-level plausibility, lacking a coherent and unified understanding of the world. Prior approaches typically incorporate only a single form of world-related knowledge or rely on rigid alignment strategies to introduce additional knowledge. However, aligning the single world knowledge is insufficient to constitute a world model that requires jointly modeling multiple heterogeneous dimensions (e.g., physical commonsense, 3D and temporal consistency). To address this limitation, we introduce DreamWorld, a unified framework that integrates complementary world knowledge into video generators via a Joint World Modeling Paradigm, jointly predicting video pixels and features from foundation models to capture temporal dynamics, spatial geometry, and semantic consistency. However, naively optimizing these heterogeneous objectives can lead to visual instability and temporal flickering. To mitigate this issue, we propose Consistent Constraint Annealing (CCA) to progressively regulate world-level constraints during training, and Multi-Source Inner-Guidance to enforce learned world priors at inference. Extensive evaluations show that DreamWorld improves world consistency, outperforming Wan2.1 by 2.26 points on VBench. Code will be made publicly available at https://github.com/ABU121111/DreamWorld{mypink{Github}}.