ChatPaper.aiChatPaper

AR-RAG:自回归检索增强的图像生成方法

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

June 8, 2025
作者: Jingyuan Qi, Zhiyang Xu, Qifan Wang, Lifu Huang
cs.AI

摘要

我们提出了自回归检索增强(Autoregressive Retrieval Augmentation, AR-RAG),这是一种新颖的范式,通过在图像生成过程中自回归地融入基于补丁的k近邻检索,从而提升图像生成质量。与现有方法在生成前进行单一、静态检索,并基于固定参考图像条件化整个生成过程不同,AR-RAG在每一步生成时执行上下文感知的检索,利用先前生成的补丁作为查询,检索并整合最相关的补丁级视觉参考,使模型能够响应不断变化的生成需求,同时避免了现有方法中普遍存在的过度复制、风格偏差等局限。为实现AR-RAG,我们提出了两种并行框架:(1)解码中的分布增强(Distribution-Augmentation in Decoding, DAiD),一种无需训练的即插即用解码策略,直接将模型预测补丁的分布与检索补丁的分布融合;(2)解码中的特征增强(Feature-Augmentation in Decoding, FAiD),一种参数高效的微调方法,通过多尺度卷积操作逐步平滑检索补丁的特征,并利用这些特征增强图像生成过程。我们在广泛采用的基准测试集上验证了AR-RAG的有效性,包括Midjourney-30K、GenEval和DPG-Bench,结果表明其相较于最先进的图像生成模型取得了显著的性能提升。
English
We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating knearest neighbor retrievals at the patch level. Unlike prior methods that perform a single, static retrieval before generation and condition the entire generation on fixed reference images, AR-RAG performs context-aware retrievals at each generation step, using prior-generated patches as queries to retrieve and incorporate the most relevant patch-level visual references, enabling the model to respond to evolving generation needs while avoiding limitations (e.g., over-copying, stylistic bias, etc.) prevalent in existing methods. To realize AR-RAG, we propose two parallel frameworks: (1) Distribution-Augmentation in Decoding (DAiD), a training-free plug-and-use decoding strategy that directly merges the distribution of model-predicted patches with the distribution of retrieved patches, and (2) Feature-Augmentation in Decoding (FAiD), a parameter-efficient fine-tuning method that progressively smooths the features of retrieved patches via multi-scale convolution operations and leverages them to augment the image generation process. We validate the effectiveness of AR-RAG on widely adopted benchmarks, including Midjourney-30K, GenEval and DPG-Bench, demonstrating significant performance gains over state-of-the-art image generation models.
PDF262June 17, 2025