Seedance 1.5 pro：原生音視訊聯合生成基礎模型

摘要

近期視訊生成領域的突破性進展為統一的視聽生成開闢了新路徑。本研究推出Seedance 1.5 pro——一款專為原生視聽聯合生成設計的基礎模型。該模型採用雙分支擴散轉換器架構，通過跨模態聯合模組與專業級多階段數據管線的協同整合，實現了卓越的視聽同步效果與頂級生成品質。為確保實用性，我們實施了精細化的訓練後優化策略，包括基於高質量數據集的監督微調，以及結合多維度獎勵模型的人類反饋強化學習。此外，我們還引入了加速框架，使推理速度提升逾10倍。Seedance 1.5 pro憑藉其精準的多語言及方言唇形同步、動態電影級鏡頭控制，以及強化敘事連貫性等特性，成為專業級內容創作的強勁引擎。該模型現已登陸火山引擎平台：https://console.volcengine.com/ark/region:ark+cn-beijing/experience/vision?type=GenVideo。

English

Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning (SFT) on high-quality datasets and Reinforcement Learning from Human Feedback (RLHF) with multi-dimensional reward models. Furthermore, we introduce an acceleration framework that boosts inference speed by over 10X. Seedance 1.5 pro distinguishes itself through precise multilingual and dialect lip-syncing, dynamic cinematic camera control, and enhanced narrative coherence, positioning it as a robust engine for professional-grade content creation. Seedance 1.5 pro is now accessible on Volcano Engine at https://console.volcengine.com/ark/region:ark+cn-beijing/experience/vision?type=GenVideo.

Seedance 1.5 pro：原生音視訊聯合生成基礎模型

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

摘要

Support