ChatPaper.aiChatPaper

一致性模型的強化學習:快速獎勵引導的文本到圖像生成

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

March 25, 2024
作者: Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun
cs.AI

摘要

強化學習(RL)通過直接優化捕捉圖像質量、美學和指示遵循能力的獎勵,已經改進了擴散模型引導的圖像生成。然而,生成的策略繼承了擴散模型的迭代採樣過程,導致生成速度緩慢。為了克服這一限制,一致性模型提出了學習一類新的生成模型,直接將噪音映射到數據,從而生成一幅圖像可能只需一次採樣迭代。在這項工作中,為了針對任務特定獎勵優化文本到圖像生成模型並實現快速訓練和推理,我們提出了一個通過RL對一致性模型進行微調的框架。我們的框架名為一致性模型強化學習(RLCM),將一致性模型的迭代推理過程視為一個RL過程。RLCM在文本到圖像生成能力上優於RL微調的擴散模型,並在推理過程中交換計算以獲得樣本質量。在實驗中,我們展示了RLCM可以使文本到圖像一致性模型適應難以通過提示表達的目標,例如圖像壓縮性,以及源自人類反饋的目標,例如美學質量。與RL微調的擴散模型相比,RLCM訓練速度顯著更快,根據獎勵目標衡量,提高了生成的質量,並通過僅需兩個推理步驟即可生成高質量圖像來加快推理過程。我們的代碼可在https://rlcm.owenoertell.com找到。
English
Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration. In this work, to optimize text-to-image generative models for task specific rewards and enable fast training and inference, we propose a framework for fine-tuning consistency models via RL. Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure. RLCM improves upon RL fine-tuned diffusion models on text-to-image generation capabilities and trades computation during inference time for sample quality. Experimentally, we show that RLCM can adapt text-to-image consistency models to objectives that are challenging to express with prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps. Our code is available at https://rlcm.owenoertell.com

Summary

AI-Generated Summary

PDF163December 15, 2024