OmniWeaving：迈向自由组合与推理的统一视频生成框架

摘要

尽管专有系统如Seedance-2.0已在全能视频生成领域取得显著成功，开源替代方案仍明显落后。多数学术模型仍处于高度碎片化状态，现有少数统一视频生成的尝试仍难以在单一框架内无缝整合多样化任务。为弥补这一差距，我们提出OmniWeaving——一个具备强大多模态组合与推理感知能力的全层级视频生成模型。通过利用涵盖多样化组合与推理增强场景的大规模预训练数据集，OmniWeaving能时序绑定交错输入的文本、多图像及视频数据，同时作为智能代理推断复杂用户意图以实现精细化视频创作。此外，我们推出首个全面评估下一代智能统一视频生成能力的基准测试IntelligentVBench。大量实验表明，OmniWeaving在开源统一模型中实现了最先进的性能。代码与模型即将公开。项目页面：https://omniweaving.github.io。

English

While proprietary systems such as Seedance-2.0 have achieved remarkable success in omni-capable video generation, open-source alternatives significantly lag behind. Most academic models remain heavily fragmented, and the few existing efforts toward unified video generation still struggle to seamlessly integrate diverse tasks within a single framework. To bridge this gap, we propose OmniWeaving, an omni-level video generation model featuring powerful multimodal composition and reasoning-informed capabilities. By leveraging a massive-scale pretraining dataset that encompasses diverse compositional and reasoning-augmented scenarios, OmniWeaving learns to temporally bind interleaved text, multi-image, and video inputs while acting as an intelligent agent to infer complex user intentions for sophisticated video creation. Furthermore, we introduce IntelligentVBench, the first comprehensive benchmark designed to rigorously assess next-level intelligent unified video generation. Extensive experiments demonstrate that OmniWeaving achieves SoTA performance among open-source unified models. The codes and model will be made publicly available soon. Project Page: https://omniweaving.github.io.

OmniWeaving：迈向自由组合与推理的统一视频生成框架

OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

摘要

Support