多樣性獎勵的CFG蒸餾

Diversity-Rewarded CFG Distillation

October 8, 2024
作者: Geoffrey Cideron, Andrea Agostinelli, Johan Ferret, Sertan Girgin, Romuald Elie, Olivier Bachem, Sarah Perrin, Alexandre Ramé
cs.AI

摘要

生成模型正在改變創意領域,如音樂生成,推理時間策略,如無分類器引導(CFG)發揮了至關重要的作用。然而,CFG會使推理成本加倍,同時限制生成內容的原創性和多樣性。本文介紹了多樣性獎勵的CFG蒸餾,這是一種新穎的微調程序,旨在提煉CFG的優勢,同時解決其局限性。我們的方法優化了兩個訓練目標:(1)蒸餾目標,鼓勵模型單獨(無需CFG)模仿CFG增強的預測,以及(2)帶有多樣性獎勵的RL目標,促進對給定提示生成多樣性輸出。通過微調,我們學習了具有生成高質量和多樣性輸出能力的模型權重,而無需進行任何推理開銷。這也開啟了基於權重的模型合併策略的潛力:通過在兩個模型的權重之間插值(第一個專注於質量,第二個專注於多樣性),我們可以在部署時控制質量-多樣性的權衡,甚至進一步提高性能。我們在MusicLM(Agostinelli等,2023)文本到音樂生成模型上進行了大量實驗,我們的方法在質量-多樣性帕累托最優方面超越了CFG。根據人類評估者的說法,我們微調後合併的模型生成的樣本在質量-多樣性方面優於基於CFG增強的基本模型。探索我們的生成:https://google-research.github.io/seanet/musiclm/diverse_music/。
English
Generative models are transforming creative domains such as music generation, with inference-time strategies like Classifier-Free Guidance (CFG) playing a crucial role. However, CFG doubles inference cost while limiting originality and diversity across generated contents. In this paper, we introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations. Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt. By finetuning, we learn model weights with the ability to generate high-quality and diverse outputs, without any inference overhead. This also unlocks the potential of weight-based model merging strategies: by interpolating between the weights of two models (the first focusing on quality, the second on diversity), we can control the quality-diversity trade-off at deployment time, and even further boost performance. We conduct extensive experiments on the MusicLM (Agostinelli et al., 2023) text-to-music generative model, where our approach surpasses CFG in terms of quality-diversity Pareto optimality. According to human evaluators, our finetuned-then-merged model generates samples with higher quality-diversity than the base model augmented with CFG. Explore our generations at https://google-research.github.io/seanet/musiclm/diverse_music/.

Summary

AI-Generated Summary

PDF102November 16, 2024