Seedance 1.0：探索视频生成模型的边界

摘要

在擴散模型領域的顯著突破推動了視頻生成技術的快速進步，然而當前基礎模型在同時平衡指令遵循、運動合理性及視覺質量方面仍面臨關鍵挑戰。本報告介紹了Seedance 1.0，這是一款高性能且推理高效的視頻基礎生成模型，它集成了多項核心技術改進：(i) 通過精確且富有意義的視頻字幕增強的多源數據策展，實現了跨多樣場景的全面學習；(ii) 提出了一種高效的架構設計與訓練範式，原生支持多鏡頭生成，並聯合學習文本到視頻與圖像到視頻任務；(iii) 精心優化的訓練後方法，利用細粒度監督微調及視頻專用的RLHF（基於人類反饋的強化學習）結合多維獎勵機制，全面提升性能；(iv) 通過多階段蒸餾策略與系統級優化，實現了約10倍的推理加速。Seedance 1.0僅需41.4秒（基於NVIDIA-L20）即可生成一段5秒鐘的1080p分辨率視頻。相比於最先進的視頻生成模型，Seedance 1.0憑藉高質量與快速的視頻生成脫穎而出，具備卓越的時空流暢性與結構穩定性，在複雜多主體情境下精確遵循指令，並實現了原生多鏡頭敘事連貫性與一致的主體表現。

English

Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture design with proposed training paradigm, which allows for natively supporting multi-shot generation and jointly learning of both text-to-video and image-to-video tasks. (iii) carefully-optimized post-training approaches leveraging fine-grained supervised fine-tuning, and video-specific RLHF with multi-dimensional reward mechanisms for comprehensive performance improvements; (iv) excellent model acceleration achieving ~10x inference speedup through multi-stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds (NVIDIA-L20). Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation having superior spatiotemporal fluidity with structural stability, precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation.