Parrot：Pareto-最適多獎勵強化學習框架，用於文本到圖像生成

摘要

最近的研究表明，在文本到圖像生成中，使用具有優質獎勵的強化學習（RL）可以提高生成圖像的質量。然而，對多個獎勵進行簡單聚合可能導致某些指標的過度優化和其他指標的降級，手動找到最佳權重具有挑戰性。一種有效的策略是共同優化RL中的多個獎勵以進行文本到圖像生成。本文介紹了Parrot，這是一種新穎的多獎勵RL框架，用於文本到圖像生成。通過批次Pareto最優選擇的使用，Parrot在RL優化文本到圖像生成過程中自動識別不同獎勵之間的最佳折衷。此外，Parrot採用了一種聯合優化方法，用於T2I模型和提示擴展網絡，促進生成具有質量意識的文本提示，進一步提高最終圖像質量。為了對抗由於提示擴展而導致的原始用戶提示的潛在災難性遺忘，我們在推斷時引入了原始提示中心引導，確保生成的圖像忠實於用戶輸入。大量實驗和用戶研究表明，Parrot在各種質量標準（包括美學、人類偏好、圖像情感和文本-圖像對齊）上優於幾種基準方法。

English

Recent works demonstrate that using reinforcement learning (RL) with quality rewards can enhance the quality of generated images in text-to-image (T2I) generation. However, a simple aggregation of multiple rewards may cause over-optimization in certain metrics and degradation in others, and it is challenging to manually find the optimal weights. An effective strategy to jointly optimize multiple rewards in RL for T2I generation is highly desirable. This paper introduces Parrot, a novel multi-reward RL framework for T2I generation. Through the use of the batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards during the RL optimization of the T2I generation. Additionally, Parrot employs a joint optimization approach for the T2I model and the prompt expansion network, facilitating the generation of quality-aware text prompts, thus further enhancing the final image quality. To counteract the potential catastrophic forgetting of the original user prompt due to prompt expansion, we introduce original prompt centered guidance at inference time, ensuring that the generated image remains faithful to the user input. Extensive experiments and a user study demonstrate that Parrot outperforms several baseline methods across various quality criteria, including aesthetics, human preference, image sentiment, and text-image alignment.

Parrot：Pareto-最適多獎勵強化學習框架，用於文本到圖像生成

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

摘要

Support