T2V-Turbo-v2:通过数据、奖励和条件引导设计增强视频生成模型的后训练
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
October 8, 2024
作者: Jiachen Li, Qian Long, Jian Zheng, Xiaofeng Gao, Robinson Piramuthu, Wenhu Chen, William Yang Wang
cs.AI
摘要
本文关注在后期训练阶段通过从预训练的文本到视频(T2V)模型中提炼出一个高度可靠的一致性模型,以增强基于扩散的文本到视频模型。我们提出的方法,T2V-Turbo-v2,通过将各种监督信号(包括高质量训练数据、奖励模型反馈和条件指导)整合到一致性提炼过程中,引入了重大进展。通过全面的消融研究,我们强调了根据具体学习目标定制数据集的至关重要性,以及从不同奖励模型中学习以增强视觉质量和文本-视频对齐的有效性。此外,我们突出了条件指导策略的广泛设计空间,重点在于设计一个有效的能量函数来增强教师ODE求解器。我们通过从训练数据集中提取运动指导并将其纳入ODE求解器,展示了这种方法的潜力,表明它在改善生成视频的运动质量方面的有效性,从VBench和T2V-CompBench的改进运动相关指标中得到证明。从经验上看,我们的T2V-Turbo-v2在VBench上取得了新的最先进结果,总分为85.13,超过了专有系统如Gen-3和Kling。
English
In this paper, we focus on enhancing a diffusion-based text-to-video (T2V)
model during the post-training phase by distilling a highly capable consistency
model from a pretrained T2V model. Our proposed method, T2V-Turbo-v2,
introduces a significant advancement by integrating various supervision
signals, including high-quality training data, reward model feedback, and
conditional guidance, into the consistency distillation process. Through
comprehensive ablation studies, we highlight the crucial importance of
tailoring datasets to specific learning objectives and the effectiveness of
learning from diverse reward models for enhancing both the visual quality and
text-video alignment. Additionally, we highlight the vast design space of
conditional guidance strategies, which centers on designing an effective energy
function to augment the teacher ODE solver. We demonstrate the potential of
this approach by extracting motion guidance from the training datasets and
incorporating it into the ODE solver, showcasing its effectiveness in improving
the motion quality of the generated videos with the improved motion-related
metrics from VBench and T2V-CompBench. Empirically, our T2V-Turbo-v2
establishes a new state-of-the-art result on VBench, with a Total score of
85.13, surpassing proprietary systems such as Gen-3 and Kling.Summary
AI-Generated Summary