ChatPaper.aiChatPaper

T2V-Turbo-v2:透過資料、獎勵和條件引導設計增強影片生成模型的後訓練

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

October 8, 2024
作者: Jiachen Li, Qian Long, Jian Zheng, Xiaofeng Gao, Robinson Piramuthu, Wenhu Chen, William Yang Wang
cs.AI

摘要

本文著重於在事後訓練階段通過從預訓練的文本到視頻(T2V)模型中提煉出一個高效的一致性模型,以增強基於擴散的文本到視頻(T2V)模型。我們提出的方法,T2V-Turbo-v2,通過將各種監督信號(包括高質量的訓練數據、獎勵模型反饋和條件引導)整合到一致性提煉過程中,引入了顯著的進步。通過全面的消融研究,我們強調了根據具體學習目標定制數據集的至關重要性,以及從不同獎勵模型中學習以增強視覺質量和文本-視頻對齊的有效性。此外,我們突出了條件引導策略的廣泛設計空間,重點在於設計一個有效的能量函數來增強教師 ODE 求解器。我們通過從訓練數據集中提取運動引導並將其融入ODE求解器,展示了這種方法的潛力,顯示了它在通過VBench和T2V-CompBench改進的運動相關指標中提高生成視頻的運動質量方面的有效性。從實證上看,我們的T2V-Turbo-v2在VBench上取得了新的最先進成果,總分為85.13,超越了Gen-3和Kling等專有系統。
English
In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the post-training phase by distilling a highly capable consistency model from a pretrained T2V model. Our proposed method, T2V-Turbo-v2, introduces a significant advancement by integrating various supervision signals, including high-quality training data, reward model feedback, and conditional guidance, into the consistency distillation process. Through comprehensive ablation studies, we highlight the crucial importance of tailoring datasets to specific learning objectives and the effectiveness of learning from diverse reward models for enhancing both the visual quality and text-video alignment. Additionally, we highlight the vast design space of conditional guidance strategies, which centers on designing an effective energy function to augment the teacher ODE solver. We demonstrate the potential of this approach by extracting motion guidance from the training datasets and incorporating it into the ODE solver, showcasing its effectiveness in improving the motion quality of the generated videos with the improved motion-related metrics from VBench and T2V-CompBench. Empirically, our T2V-Turbo-v2 establishes a new state-of-the-art result on VBench, with a Total score of 85.13, surpassing proprietary systems such as Gen-3 and Kling.

Summary

AI-Generated Summary

PDF142November 16, 2024