ChatPaper.aiChatPaper

一致性模型的强化学习:更快速的奖励引导文本到图像生成

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

March 25, 2024
作者: Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun
cs.AI

摘要

强化学习(RL)通过直接优化捕捉图像质量、美学和指令遵循能力的奖励,改进了扩散模型引导的图像生成。然而,由此产生的生成策略继承了引起生成速度缓慢的扩散模型的相同迭代抽样过程。为了克服这一限制,一致性模型提出了学习一类新的生成模型,直接将噪声映射到数据,从而生成模型可以在仅一个抽样迭代中生成图像。在这项工作中,为了针对任务特定奖励优化文本到图像生成模型并实现快速训练和推断,我们提出了一个通过RL对一致性模型进行微调的框架。我们的框架被称为一致性模型强化学习(RLCM),将一致性模型的迭代推断过程构建为一个RL过程。RLCM在文本到图像生成能力上优于RL微调的扩散模型,并在推断时交换计算以获得样本质量。实验证明,RLCM能够将文本到图像一致性模型调整到难以通过提示表达的目标,如图像可压缩性,以及从人类反馈中衍生的目标,如美学质量。与RL微调的扩散模型相比,RLCM训练速度显著更快,根据奖励目标衡量的生成质量得到改善,并通过在仅两个推断步骤中生成高质量图像加快了推断过程。我们的代码可在https://rlcm.owenoertell.com找到。
English
Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration. In this work, to optimize text-to-image generative models for task specific rewards and enable fast training and inference, we propose a framework for fine-tuning consistency models via RL. Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure. RLCM improves upon RL fine-tuned diffusion models on text-to-image generation capabilities and trades computation during inference time for sample quality. Experimentally, we show that RLCM can adapt text-to-image consistency models to objectives that are challenging to express with prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps. Our code is available at https://rlcm.owenoertell.com

Summary

AI-Generated Summary

PDF163December 15, 2024