Seedance 1.0:探索视频生成模型的边界

Seedance 1.0: Exploring the Boundaries of Video Generation Models

June 10, 2025
作者: Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong, Huixia Li, Jiashi Li, Liang Li, Xiaojie Li, Xunsong Li, Yifu Li, Shanchuan Lin, Zhijie Lin, Jiawei Liu, Shu Liu, Xiaonan Nie, Zhiwu Qing, Yuxi Ren, Li Sun, Zhi Tian, Rui Wang, Sen Wang, Guoqiang Wei, Guohong Wu, Jie Wu, Ruiqi Xia, Fei Xiao, Xuefeng Xiao, Jiangqiao Yan, Ceyuan Yang, Jianchao Yang, Runkai Yang, Tao Yang, Yihang Yang, Zilyu Ye, Xuejiao Zeng, Yan Zeng, Heng Zhang, Yang Zhao, Xiaozheng Zheng, Peihao Zhu, Jiaxin Zou, Feilong Zuo
cs.AI

摘要

在扩散模型领域取得的显著突破推动了视频生成技术的快速发展,然而当前的基础模型在同时兼顾提示跟随、运动合理性和视觉质量方面仍面临关键挑战。本报告中,我们介绍了Seedance 1.0,这是一款高性能且推理效率高的视频基础生成模型,集成了多项核心技术改进:(i) 通过多源数据精选与精准、有意义的视频字幕增强,实现了跨多样场景的全面学习;(ii) 采用高效架构设计及提出的训练范式,原生支持多镜头生成,并联合学习文本到视频和图像到视频任务;(iii) 精心优化的训练后方法,利用细粒度监督微调和视频特定RLHF(基于人类反馈的强化学习)结合多维奖励机制,全面提升性能;(iv) 通过多阶段蒸馏策略和系统级优化,实现了约10倍的推理加速。Seedance 1.0仅需41.4秒(NVIDIA-L20)即可生成一段5秒的1080p分辨率视频。与最先进的视频生成模型相比,Seedance 1.0以高质量和快速生成脱颖而出,具备卓越的时空流畅性与结构稳定性,在复杂多主体情境下精确遵循指令,原生支持多镜头叙事连贯性并保持主体表现一致。
English
Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture design with proposed training paradigm, which allows for natively supporting multi-shot generation and jointly learning of both text-to-video and image-to-video tasks. (iii) carefully-optimized post-training approaches leveraging fine-grained supervised fine-tuning, and video-specific RLHF with multi-dimensional reward mechanisms for comprehensive performance improvements; (iv) excellent model acceleration achieving ~10x inference speedup through multi-stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds (NVIDIA-L20). Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation having superior spatiotemporal fluidity with structural stability, precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation.
PDF875June 12, 2025