ReZero: もう一度試すことでLLMの検索能力を強化

要旨

検索拡張生成（RAG）は、知識集約型タスクにおける大規模言語モデル（LLM）の性能を向上させますが、初期検索クエリの品質に大きく依存します。現在の手法では、強化学習（RL）を用いてクエリの作成や結果に対する推論に焦点を当てることが一般的ですが、検索失敗後の継続を明示的に奨励することはありません。本研究では、ReZero（リトライ・ゼロ）という新しいRLフレームワークを提案します。ReZeroは、初期の検索試行が失敗した後に検索クエリを再試行する行為を直接報酬として与えることで、LLMが早期に停止するのではなく、代替クエリを探索することを促します。ReZeroは、25%のベースラインに対して46.88%の精度を達成し、大幅な改善を示しています。継続を報酬とすることで、ReZeroは初期クエリが不十分である可能性のある複雑な情報探索シナリオにおいて、LLMの堅牢性を向上させます。

English

Retrieval-Augmented Generation (RAG) improves Large Language Model (LLM) performance on knowledge-intensive tasks but depends heavily on initial search query quality. Current methods, often using Reinforcement Learning (RL), typically focus on query formulation or reasoning over results, without explicitly encouraging persistence after a failed search. We introduce ReZero (Retry-Zero), a novel RL framework that directly rewards the act of retrying a search query following an initial unsuccessful attempt. This incentivizes the LLM to explore alternative queries rather than prematurely halting. ReZero demonstrates significant improvement, achieving 46.88% accuracy compared to a 25% baseline. By rewarding persistence, ReZero enhances LLM robustness in complex information-seeking scenarios where initial queries may prove insufficient.

ReZero: もう一度試すことでLLMの検索能力を強化

ReZero: Enhancing LLM search ability by trying one-more-time

要旨

Support