OneGen: LLM向けの効率的なワンパス統合生成と検索

要旨

最近の大規模言語モデル（LLM）の進歩にもかかわらず、さまざまな自然言語処理（NLP）タスクの生成能力を大幅に向上させたものの、LLMは直接的に検索タスクを処理する際には制約に直面しています。ただし、多くの実用的なアプリケーションでは、検索と生成の両方をシームレスに統合することが求められます。本論文では、新しい効率的なワンパス生成と検索フレームワーク（OneGen）を導入し、生成と検索の両方を必要とするタスクにおけるLLMのパフォーマンスを向上させることを目指します。提案されたフレームワークは、検索トークンを自己回帰的に生成することで、従来は別々に行われていた生成と検索のトレーニングアプローチを結びつけます。これにより、単一のLLMが統一された前向きパスで両方のタスクを同時に処理できるようになります。RAGとEntity Linkingという2つの異なる種類の複合タスクで実験を行い、OneGenのトレーニングと推論における差し込み可能性、効果、効率性を検証します。さらに、結果は、生成と検索を同じコンテキスト内で統合することが、LLMの生成能力を保持しながら検索パフォーマンスを向上させることを示しています。私たちの知る限り、OneGenはLLMに対して生成中にベクトル検索を実行させる最初のものです。

English

Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval. The proposed framework bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass. We conduct experiments on two distinct types of composite tasks, RAG and Entity Linking, to validate the pluggability, effectiveness, and efficiency of OneGen in training and inference. Furthermore, our results show that integrating generation and retrieval within the same context preserves the generative capabilities of LLMs while improving retrieval performance. To the best of our knowledge, OneGen is the first to enable LLMs to conduct vector retrieval during the generation.

OneGen: LLM向けの効率的なワンパス統合生成と検索

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

要旨

Support