ChatPaper.aiChatPaper

统一的文本到图像生成与检索

Unified Text-to-Image Generation and Retrieval

June 9, 2024
作者: Leigang Qu, Haochuan Li, Tan Wang, Wenjie Wang, Yongqi Li, Liqiang Nie, Tat-Seng Chua
cs.AI

摘要

人类如何高效有效地获取图像一直是一个长久存在的问题。一个典型的解决方案是从现有数据库中根据文本查询进行文本到图像检索;然而,有限的数据库通常缺乏创造性。相比之下,最近在文本到图像生成方面取得的突破使得产生花样繁多的视觉内容成为可能,但在合成知识密集型图像方面面临挑战。在这项工作中,我们重新思考了文本到图像生成和检索之间的关系,并在多模态大型语言模型(MLLMs)的背景下提出了一个统一的框架。具体来说,我们首先探索了MLLMs的内在判别能力,并引入了一种生成检索方法,以无需训练的方式进行检索。随后,我们以自回归生成的方式统一了生成和检索,并提出了一个自主决策模块,以选择在生成和检索的图像中最匹配的一个作为对文本查询的响应。此外,我们构建了一个名为TIGeR-Bench的基准,包括创造性和知识密集型领域,以规范统一的文本到图像生成和检索的评估。在TIGeR-Bench和两个检索基准,即Flickr30K和MS-COCO上的广泛实验结果显示了我们提出的方法的优越性和有效性。
English
How humans can efficiently and effectively acquire images has always been a perennial question. A typical solution is text-to-image retrieval from an existing database given the text query; however, the limited database typically lacks creativity. By contrast, recent breakthroughs in text-to-image generation have made it possible to produce fancy and diverse visual content, but it faces challenges in synthesizing knowledge-intensive images. In this work, we rethink the relationship between text-to-image generation and retrieval and propose a unified framework in the context of Multimodal Large Language Models (MLLMs). Specifically, we first explore the intrinsic discriminative abilities of MLLMs and introduce a generative retrieval method to perform retrieval in a training-free manner. Subsequently, we unify generation and retrieval in an autoregressive generation way and propose an autonomous decision module to choose the best-matched one between generated and retrieved images as the response to the text query. Additionally, we construct a benchmark called TIGeR-Bench, including creative and knowledge-intensive domains, to standardize the evaluation of unified text-to-image generation and retrieval. Extensive experimental results on TIGeR-Bench and two retrieval benchmarks, i.e., Flickr30K and MS-COCO, demonstrate the superiority and effectiveness of our proposed method.

Summary

AI-Generated Summary

PDF160December 8, 2024