Seedance 1.0:探索视频生成模型的边界
Seedance 1.0: Exploring the Boundaries of Video Generation Models
June 10, 2025
作者: Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong, Huixia Li, Jiashi Li, Liang Li, Xiaojie Li, Xunsong Li, Yifu Li, Shanchuan Lin, Zhijie Lin, Jiawei Liu, Shu Liu, Xiaonan Nie, Zhiwu Qing, Yuxi Ren, Li Sun, Zhi Tian, Rui Wang, Sen Wang, Guoqiang Wei, Guohong Wu, Jie Wu, Ruiqi Xia, Fei Xiao, Xuefeng Xiao, Jiangqiao Yan, Ceyuan Yang, Jianchao Yang, Runkai Yang, Tao Yang, Yihang Yang, Zilyu Ye, Xuejiao Zeng, Yan Zeng, Heng Zhang, Yang Zhao, Xiaozheng Zheng, Peihao Zhu, Jiaxin Zou, Feilong Zuo
cs.AI
摘要
在擴散模型領域的顯著突破推動了視頻生成技術的快速進步,然而當前基礎模型在同時平衡指令遵循、運動合理性及視覺質量方面仍面臨關鍵挑戰。本報告介紹了Seedance 1.0,這是一款高性能且推理高效的視頻基礎生成模型,它集成了多項核心技術改進:(i) 通過精確且富有意義的視頻字幕增強的多源數據策展,實現了跨多樣場景的全面學習;(ii) 提出了一種高效的架構設計與訓練範式,原生支持多鏡頭生成,並聯合學習文本到視頻與圖像到視頻任務;(iii) 精心優化的訓練後方法,利用細粒度監督微調及視頻專用的RLHF(基於人類反饋的強化學習)結合多維獎勵機制,全面提升性能;(iv) 通過多階段蒸餾策略與系統級優化,實現了約10倍的推理加速。Seedance 1.0僅需41.4秒(基於NVIDIA-L20)即可生成一段5秒鐘的1080p分辨率視頻。相比於最先進的視頻生成模型,Seedance 1.0憑藉高質量與快速的視頻生成脫穎而出,具備卓越的時空流暢性與結構穩定性,在複雜多主體情境下精確遵循指令,並實現了原生多鏡頭敘事連貫性與一致的主體表現。
English
Notable breakthroughs in diffusion modeling have propelled rapid improvements
in video generation, yet current foundational model still face critical
challenges in simultaneously balancing prompt following, motion plausibility,
and visual quality. In this report, we introduce Seedance 1.0, a
high-performance and inference-efficient video foundation generation model that
integrates several core technical improvements: (i) multi-source data curation
augmented with precision and meaningful video captioning, enabling
comprehensive learning across diverse scenarios; (ii) an efficient architecture
design with proposed training paradigm, which allows for natively supporting
multi-shot generation and jointly learning of both text-to-video and
image-to-video tasks. (iii) carefully-optimized post-training approaches
leveraging fine-grained supervised fine-tuning, and video-specific RLHF with
multi-dimensional reward mechanisms for comprehensive performance improvements;
(iv) excellent model acceleration achieving ~10x inference speedup through
multi-stage distillation strategies and system-level optimizations. Seedance
1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds
(NVIDIA-L20). Compared to state-of-the-art video generation models, Seedance
1.0 stands out with high-quality and fast video generation having superior
spatiotemporal fluidity with structural stability, precise instruction
adherence in complex multi-subject contexts, native multi-shot narrative
coherence with consistent subject representation.