自學式長上下文理解代理

摘要

回答複雜且長上下文問題對於大型語言模型（LLMs）來說仍是一大挑戰，因為這需要有效的問題澄清與上下文檢索。我們提出了「代理式長上下文理解」（Agentic Long-Context Understanding, AgenticLU），這是一個旨在通過在代理工作流程中整合針對性的自我澄清與上下文基礎來增強LLM對此類查詢理解的框架。AgenticLU的核心是「澄清鏈」（Chain-of-Clarifications, CoC），在此過程中，模型通過自我生成的澄清問題及相應的上下文基礎來精煉其理解。通過將推理擴展為樹狀搜索，其中每個節點代表一個CoC步驟，我們在NarrativeQA上達到了97.8%的答案召回率，搜索深度最多為三層，分支因子為八。為了將這一高成本的搜索過程分攤到訓練中，我們利用CoC工作流程獲得的每一步偏好對，並進行兩階段模型微調：（1）監督式微調以學習有效的分解策略，（2）直接偏好優化以提升推理質量。這使得AgenticLU模型能夠在單次推理過程中有效且高效地生成澄清並檢索相關上下文。在七個長上下文任務上的廣泛實驗表明，AgenticLU顯著優於最先進的提示方法和專用的長上下文LLM，實現了強大的多跳推理，同時在上下文長度增長時保持一致的性能。

English

Answering complex, long-context questions remains a major challenge for large language models (LLMs) as it requires effective question clarifications and context retrieval. We propose Agentic Long-Context Understanding (AgenticLU), a framework designed to enhance an LLM's understanding of such queries by integrating targeted self-clarification with contextual grounding within an agentic workflow. At the core of AgenticLU is Chain-of-Clarifications (CoC), where models refine their understanding through self-generated clarification questions and corresponding contextual groundings. By scaling inference as a tree search where each node represents a CoC step, we achieve 97.8% answer recall on NarrativeQA with a search depth of up to three and a branching factor of eight. To amortize the high cost of this search process to training, we leverage the preference pairs for each step obtained by the CoC workflow and perform two-stage model finetuning: (1) supervised finetuning to learn effective decomposition strategies, and (2) direct preference optimization to enhance reasoning quality. This enables AgenticLU models to generate clarifications and retrieve relevant context effectively and efficiently in a single inference pass. Extensive experiments across seven long-context tasks demonstrate that AgenticLU significantly outperforms state-of-the-art prompting methods and specialized long-context LLMs, achieving robust multi-hop reasoning while sustaining consistent performance as context length grows.

自學式長上下文理解代理

Self-Taught Agentic Long Context Understanding

摘要

Support