多樣性獎勵的CFG蒸餾
Diversity-Rewarded CFG Distillation
October 8, 2024
作者: Geoffrey Cideron, Andrea Agostinelli, Johan Ferret, Sertan Girgin, Romuald Elie, Olivier Bachem, Sarah Perrin, Alexandre Ramé
cs.AI
摘要
生成模型正在改變創意領域,如音樂生成,推理時間策略,如無分類器引導(CFG)發揮了至關重要的作用。然而,CFG會使推理成本加倍,同時限制生成內容的原創性和多樣性。本文介紹了多樣性獎勵的CFG蒸餾,這是一種新穎的微調程序,旨在提煉CFG的優勢,同時解決其局限性。我們的方法優化了兩個訓練目標:(1)蒸餾目標,鼓勵模型單獨(無需CFG)模仿CFG增強的預測,以及(2)帶有多樣性獎勵的RL目標,促進對給定提示生成多樣性輸出。通過微調,我們學習了具有生成高質量和多樣性輸出能力的模型權重,而無需進行任何推理開銷。這也開啟了基於權重的模型合併策略的潛力:通過在兩個模型的權重之間插值(第一個專注於質量,第二個專注於多樣性),我們可以在部署時控制質量-多樣性的權衡,甚至進一步提高性能。我們在MusicLM(Agostinelli等,2023)文本到音樂生成模型上進行了大量實驗,我們的方法在質量-多樣性帕累托最優方面超越了CFG。根據人類評估者的說法,我們微調後合併的模型生成的樣本在質量-多樣性方面優於基於CFG增強的基本模型。探索我們的生成:https://google-research.github.io/seanet/musiclm/diverse_music/。
English
Generative models are transforming creative domains such as music generation,
with inference-time strategies like Classifier-Free Guidance (CFG) playing a
crucial role. However, CFG doubles inference cost while limiting originality
and diversity across generated contents. In this paper, we introduce
diversity-rewarded CFG distillation, a novel finetuning procedure that distills
the strengths of CFG while addressing its limitations. Our approach optimises
two training objectives: (1) a distillation objective, encouraging the model
alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL
objective with a diversity reward, promoting the generation of diverse outputs
for a given prompt. By finetuning, we learn model weights with the ability to
generate high-quality and diverse outputs, without any inference overhead. This
also unlocks the potential of weight-based model merging strategies: by
interpolating between the weights of two models (the first focusing on quality,
the second on diversity), we can control the quality-diversity trade-off at
deployment time, and even further boost performance. We conduct extensive
experiments on the MusicLM (Agostinelli et al., 2023) text-to-music generative
model, where our approach surpasses CFG in terms of quality-diversity Pareto
optimality. According to human evaluators, our finetuned-then-merged model
generates samples with higher quality-diversity than the base model augmented
with CFG. Explore our generations at
https://google-research.github.io/seanet/musiclm/diverse_music/.Summary
AI-Generated Summary