OneGen:對LLM進行高效單通過統一生成和檢索
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs
September 8, 2024
作者: Jintian Zhang, Cheng Peng, Mengshu Sun, Xiang Chen, Lei Liang, Zhiqiang Zhang, Jun Zhou, Huajun Chen, Ningyu Zhang
cs.AI
摘要
儘管近年來大型語言模型(LLMs)取得了顯著進展,極大增強了各種自然語言處理任務的生成能力,但LLMs仍然面臨直接處理檢索任務的限制。然而,許多實際應用需要無縫整合檢索和生成兩者。本文介紹了一種新穎高效的一次通過生成和檢索框架(OneGen),旨在提升LLMs在需要生成和檢索兩者的任務上的表現。所提出的框架通過將自回歸生成的檢索標記納入,將傳統上為生成和檢索分開的訓練方法相連接。這使得單個LLM能夠在統一的前向傳遞中同時處理兩個任務。我們在兩種不同類型的複合任務,RAG和實體鏈接上進行實驗,以驗證OneGen在訓練和推理中的可插拔性、有效性和效率。此外,我們的結果表明,在相同語境中整合生成和檢索保留了LLMs的生成能力,同時提升了檢索性能。據我們所知,OneGen是首個使LLMs能夠在生成過程中進行向量檢索的方法。
English
Despite the recent advancements in Large Language Models (LLMs), which have
significantly enhanced the generative capabilities for various NLP tasks, LLMs
still face limitations in directly handling retrieval tasks. However, many
practical applications demand the seamless integration of both retrieval and
generation. This paper introduces a novel and efficient One-pass Generation and
retrieval framework (OneGen), designed to improve LLMs' performance on tasks
that require both generation and retrieval. The proposed framework bridges the
traditionally separate training approaches for generation and retrieval by
incorporating retrieval tokens generated autoregressively. This enables a
single LLM to handle both tasks simultaneously in a unified forward pass. We
conduct experiments on two distinct types of composite tasks, RAG and Entity
Linking, to validate the pluggability, effectiveness, and efficiency of OneGen
in training and inference. Furthermore, our results show that integrating
generation and retrieval within the same context preserves the generative
capabilities of LLMs while improving retrieval performance. To the best of our
knowledge, OneGen is the first to enable LLMs to conduct vector retrieval
during the generation.Summary
AI-Generated Summary