ImageRAG:基於參考引導的動態圖像檢索生成
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
February 13, 2025
作者: Rotem Shalev-Arkushin, Rinon Gal, Amit H. Bermano, Ohad Fried
cs.AI
摘要
擴散模型能夠實現高質量和多樣化的視覺內容合成。然而,它們在生成罕見或從未見過的概念時遇到困難。為了應對這一挑戰,我們探索了檢索增強生成(RAG)與圖像生成模型的應用。我們提出了ImageRAG,一種根據給定的文本提示動態檢索相關圖像並將其用作上下文來引導生成過程的方法。先前利用檢索圖像來改善生成的方法,專門為基於檢索的生成訓練模型。相比之下,ImageRAG利用現有圖像條件模型的能力,並且不需要RAG特定的訓練。我們的方法高度靈活,可以應用於不同的模型類型,展示了在使用不同基礎模型時生成罕見和精細概念方面的顯著改進。
我們的項目頁面可在以下網址找到:https://rotem-shalev.github.io/ImageRAG
English
Diffusion models enable high-quality and diverse visual content synthesis.
However, they struggle to generate rare or unseen concepts. To address this
challenge, we explore the usage of Retrieval-Augmented Generation (RAG) with
image generation models. We propose ImageRAG, a method that dynamically
retrieves relevant images based on a given text prompt, and uses them as
context to guide the generation process. Prior approaches that used retrieved
images to improve generation, trained models specifically for retrieval-based
generation. In contrast, ImageRAG leverages the capabilities of existing image
conditioning models, and does not require RAG-specific training. Our approach
is highly adaptable and can be applied across different model types, showing
significant improvement in generating rare and fine-grained concepts using
different base models.
Our project page is available at: https://rotem-shalev.github.io/ImageRAG