多样性奖励的CFG蒸馏

Diversity-Rewarded CFG Distillation

October 8, 2024
作者: Geoffrey Cideron, Andrea Agostinelli, Johan Ferret, Sertan Girgin, Romuald Elie, Olivier Bachem, Sarah Perrin, Alexandre Ramé
cs.AI

摘要

生成模型正在改变音乐生成等创意领域,推理时策略如无分类器引导(CFG)发挥着至关重要的作用。然而,CFG会使推理成本翻倍,同时限制生成内容的原创性和多样性。在本文中,我们介绍了奖励多样性的CFG蒸馏,这是一种新颖的微调过程,旨在蒸馏CFG的优势同时解决其局限性。我们的方法优化了两个训练目标:(1)蒸馏目标,鼓励模型(无CFG参与)模仿CFG增强的预测,以及(2)带有多样性奖励的强化学习目标,促进对给定提示生成多样化输出。通过微调,我们学习到具有生成高质量和多样化输出能力的模型权重,而无需进行任何推理开销。这也释放了基于权重的模型合并策略的潜力:通过在两个模型的权重之间插值(第一个专注于质量,第二个专注于多样性),我们可以在部署时控制质量-多样性权衡,并进一步提升性能。我们在MusicLM(Agostinelli等人,2023)文本到音乐生成模型上进行了大量实验,证明我们的方法在质量-多样性帕累托最优性方面超越了CFG。根据人类评估者的评价,我们的微调后合并模型生成的样本在质量-多样性方面优于基础模型与CFG增强的模型。请访问https://google-research.github.io/seanet/musiclm/diverse_music/ 探索我们的生成物。
English
Generative models are transforming creative domains such as music generation, with inference-time strategies like Classifier-Free Guidance (CFG) playing a crucial role. However, CFG doubles inference cost while limiting originality and diversity across generated contents. In this paper, we introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations. Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt. By finetuning, we learn model weights with the ability to generate high-quality and diverse outputs, without any inference overhead. This also unlocks the potential of weight-based model merging strategies: by interpolating between the weights of two models (the first focusing on quality, the second on diversity), we can control the quality-diversity trade-off at deployment time, and even further boost performance. We conduct extensive experiments on the MusicLM (Agostinelli et al., 2023) text-to-music generative model, where our approach surpasses CFG in terms of quality-diversity Pareto optimality. According to human evaluators, our finetuned-then-merged model generates samples with higher quality-diversity than the base model augmented with CFG. Explore our generations at https://google-research.github.io/seanet/musiclm/diverse_music/.

Summary

AI-Generated Summary

PDF102November 16, 2024