自學式長上下文理解代理
Self-Taught Agentic Long Context Understanding
February 21, 2025
作者: Yufan Zhuang, Xiaodong Yu, Jialian Wu, Ximeng Sun, Ze Wang, Jiang Liu, Yusheng Su, Jingbo Shang, Zicheng Liu, Emad Barsoum
cs.AI
摘要
回答複雜且長上下文問題對於大型語言模型(LLMs)來說仍是一大挑戰,因為這需要有效的問題澄清與上下文檢索。我們提出了「代理式長上下文理解」(Agentic Long-Context Understanding, AgenticLU),這是一個旨在通過在代理工作流程中整合針對性的自我澄清與上下文基礎來增強LLM對此類查詢理解的框架。AgenticLU的核心是「澄清鏈」(Chain-of-Clarifications, CoC),在此過程中,模型通過自我生成的澄清問題及相應的上下文基礎來精煉其理解。通過將推理擴展為樹狀搜索,其中每個節點代表一個CoC步驟,我們在NarrativeQA上達到了97.8%的答案召回率,搜索深度最多為三層,分支因子為八。為了將這一高成本的搜索過程分攤到訓練中,我們利用CoC工作流程獲得的每一步偏好對,並進行兩階段模型微調:(1)監督式微調以學習有效的分解策略,(2)直接偏好優化以提升推理質量。這使得AgenticLU模型能夠在單次推理過程中有效且高效地生成澄清並檢索相關上下文。在七個長上下文任務上的廣泛實驗表明,AgenticLU顯著優於最先進的提示方法和專用的長上下文LLM,實現了強大的多跳推理,同時在上下文長度增長時保持一致的性能。
English
Answering complex, long-context questions remains a major challenge for large
language models (LLMs) as it requires effective question clarifications and
context retrieval. We propose Agentic Long-Context Understanding (AgenticLU), a
framework designed to enhance an LLM's understanding of such queries by
integrating targeted self-clarification with contextual grounding within an
agentic workflow. At the core of AgenticLU is Chain-of-Clarifications (CoC),
where models refine their understanding through self-generated clarification
questions and corresponding contextual groundings. By scaling inference as a
tree search where each node represents a CoC step, we achieve 97.8% answer
recall on NarrativeQA with a search depth of up to three and a branching factor
of eight. To amortize the high cost of this search process to training, we
leverage the preference pairs for each step obtained by the CoC workflow and
perform two-stage model finetuning: (1) supervised finetuning to learn
effective decomposition strategies, and (2) direct preference optimization to
enhance reasoning quality. This enables AgenticLU models to generate
clarifications and retrieve relevant context effectively and efficiently in a
single inference pass. Extensive experiments across seven long-context tasks
demonstrate that AgenticLU significantly outperforms state-of-the-art prompting
methods and specialized long-context LLMs, achieving robust multi-hop reasoning
while sustaining consistent performance as context length grows.Summary
AI-Generated Summary