DCM：雙專家一致性模型——高效高質視頻生成之道

摘要

擴散模型在視頻合成領域取得了顯著成果，但其依賴於迭代去噪步驟，導致計算開銷巨大。一致性模型在加速擴散模型方面取得了重要進展。然而，直接將其應用於視頻擴散模型往往會導致時間一致性和外觀細節的嚴重退化。本文通過分析一致性模型的訓練動態，發現了蒸餾過程中一個關鍵的學習動態衝突：不同時間步的優化梯度和損失貢獻存在顯著差異。這一差異阻礙了蒸餾後的學生模型達到最佳狀態，從而影響了時間一致性並降低了外觀細節。為解決這一問題，我們提出了一種參數高效的雙專家一致性模型（DCM），其中語義專家專注於學習語義佈局和運動，而細節專家則專注於精細細節的優化。此外，我們引入了時間一致性損失來提升語義專家的運動一致性，並應用生成對抗網絡和特徵匹配損失來增強細節專家的合成質量。我們的方法在顯著減少採樣步驟的情況下實現了最先進的視覺質量，證明了專家分工在視頻擴散模型蒸餾中的有效性。我們的代碼和模型可在https://github.com/Vchitect/DCM{https://github.com/Vchitect/DCM}獲取。

English

Diffusion Models have achieved remarkable results in video synthesis but require iterative denoising steps, leading to substantial computational overhead. Consistency Models have made significant progress in accelerating diffusion models. However, directly applying them to video diffusion models often results in severe degradation of temporal consistency and appearance details. In this paper, by analyzing the training dynamics of Consistency Models, we identify a key conflicting learning dynamics during the distillation process: there is a significant discrepancy in the optimization gradients and loss contributions across different timesteps. This discrepancy prevents the distilled student model from achieving an optimal state, leading to compromised temporal consistency and degraded appearance details. To address this issue, we propose a parameter-efficient Dual-Expert Consistency Model~(DCM), where a semantic expert focuses on learning semantic layout and motion, while a detail expert specializes in fine detail refinement. Furthermore, we introduce Temporal Coherence Loss to improve motion consistency for the semantic expert and apply GAN and Feature Matching Loss to enhance the synthesis quality of the detail expert.Our approach achieves state-of-the-art visual quality with significantly reduced sampling steps, demonstrating the effectiveness of expert specialization in video diffusion model distillation. Our code and models are available at https://github.com/Vchitect/DCM{https://github.com/Vchitect/DCM}.

DCM：雙專家一致性模型——高效高質視頻生成之道

DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation

摘要

Support