OneGen：LLM 模型的高效一遍式统一生成与检索

摘要

尽管最近大型语言模型（LLMs）取得了显著进展，极大增强了各种自然语言处理任务的生成能力，但LLMs在直接处理检索任务方面仍存在局限。然而，许多实际应用需要无缝集成检索和生成两者。本文介绍了一种新颖高效的一次生成和检索框架（OneGen），旨在提高LLMs在需要生成和检索两者的任务中的性能。所提出的框架通过自回归生成检索标记，将传统上分开的生成和检索训练方法连接起来。这使得单个LLM能够在统一的前向传递中同时处理这两个任务。我们在两种不同类型的复合任务RAG和实体链接上进行实验，以验证OneGen在训练和推断中的可插拔性、有效性和效率。此外，我们的结果表明，在相同上下文中集成生成和检索可以保留LLMs的生成能力同时提高检索性能。据我们所知，OneGen是首个使LLMs能够在生成过程中进行向量检索的框架。

English

Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval. The proposed framework bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass. We conduct experiments on two distinct types of composite tasks, RAG and Entity Linking, to validate the pluggability, effectiveness, and efficiency of OneGen in training and inference. Furthermore, our results show that integrating generation and retrieval within the same context preserves the generative capabilities of LLMs while improving retrieval performance. To the best of our knowledge, OneGen is the first to enable LLMs to conduct vector retrieval during the generation.

OneGen：LLM 模型的高效一遍式统一生成与检索

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

摘要

Support