自己学習型エージェントの長文脈理解

要旨

複雑で長文脈の質問に答えることは、大規模言語モデル（LLM）にとって依然として大きな課題であり、効果的な質問の明確化と文脈の検索が求められます。本論文では、エージェント型長文脈理解（Agentic Long-Context Understanding, AgenticLU）というフレームワークを提案します。これは、エージェント型ワークフロー内で、ターゲットを絞った自己明確化と文脈的基盤付けを統合することで、LLMの理解を強化するものです。AgenticLUの核心となるのは、Chain-of-Clarifications（CoC）であり、モデルは自己生成した明確化質問とそれに対応する文脈的基盤付けを通じて理解を洗練させます。各ノードがCoCのステップを表すツリー探索として推論をスケーリングすることで、NarrativeQAにおいて探索深度3、分岐因子8で97.8%の回答再現率を達成しました。この探索プロセスの高コストを訓練に分散させるため、CoCワークフローによって得られた各ステップの選好ペアを活用し、二段階のモデルファインチューニングを行います：（1）効果的な分解戦略を学習するための教師ありファインチューニング、（2）推論品質を向上させるための直接選好最適化。これにより、AgenticLUモデルは単一の推論パスで明確化を生成し、関連する文脈を効果的かつ効率的に検索できるようになります。7つの長文脈タスクにわたる広範な実験により、AgenticLUが最先端のプロンプト手法や専門化された長文脈LLMを大幅に上回り、文脈長が増加しても一貫した性能を維持しながら堅牢なマルチホップ推論を実現することが示されました。

English

Answering complex, long-context questions remains a major challenge for large language models (LLMs) as it requires effective question clarifications and context retrieval. We propose Agentic Long-Context Understanding (AgenticLU), a framework designed to enhance an LLM's understanding of such queries by integrating targeted self-clarification with contextual grounding within an agentic workflow. At the core of AgenticLU is Chain-of-Clarifications (CoC), where models refine their understanding through self-generated clarification questions and corresponding contextual groundings. By scaling inference as a tree search where each node represents a CoC step, we achieve 97.8% answer recall on NarrativeQA with a search depth of up to three and a branching factor of eight. To amortize the high cost of this search process to training, we leverage the preference pairs for each step obtained by the CoC workflow and perform two-stage model finetuning: (1) supervised finetuning to learn effective decomposition strategies, and (2) direct preference optimization to enhance reasoning quality. This enables AgenticLU models to generate clarifications and retrieve relevant context effectively and efficiently in a single inference pass. Extensive experiments across seven long-context tasks demonstrate that AgenticLU significantly outperforms state-of-the-art prompting methods and specialized long-context LLMs, achieving robust multi-hop reasoning while sustaining consistent performance as context length grows.

自己学習型エージェントの長文脈理解

Self-Taught Agentic Long Context Understanding

要旨

Support