자가 학습 기반 장기 문맥 이해

초록

복잡하고 장문의 문맥을 요구하는 질문에 답하는 것은 대규모 언어 모델(LLM)에게 여전히 주요 과제로 남아 있으며, 이는 효과적인 질문 명확화와 문맥 검색을 필요로 합니다. 우리는 이러한 쿼리에 대한 LLM의 이해를 향상시키기 위해 에이전트 기반 워크플로우 내에서 타겟팅된 자기 명확화와 문맥 기반을 통합한 Agentic Long-Context Understanding (AgenticLU) 프레임워크를 제안합니다. AgenticLU의 핵심은 Chain-of-Clarifications (CoC)로, 모델이 자체적으로 생성한 명확화 질문과 해당 문맥 기반을 통해 이해를 정제하는 과정입니다. 각 노드가 CoC 단계를 나타내는 트리 탐색으로 추론을 확장함으로써, 최대 3의 탐색 깊이와 8의 분기 계수를 사용하여 NarrativeQA에서 97.8%의 답변 재현율을 달성했습니다. 이 탐색 과정의 높은 비용을 훈련에 분산시키기 위해, 우리는 CoC 워크플로우를 통해 얻은 각 단계의 선호 쌍을 활용하고 두 단계의 모델 미세 조정을 수행합니다: (1) 효과적인 분해 전략을 학습하기 위한 지도 미세 조정, (2) 추론 품질을 향상시키기 위한 직접 선호 최적화. 이를 통해 AgenticLU 모델은 단일 추론 패스에서 명확화를 생성하고 관련 문맥을 효과적이고 효율적으로 검색할 수 있습니다. 7개의 장문 맥락 작업에 대한 광범위한 실험을 통해, AgenticLU가 최신 프롬프팅 방법과 특화된 장문 맥락 LLM을 크게 능가하며, 문맥 길이가 증가함에 따라 일관된 성능을 유지하면서도 강력한 다중 홉 추론을 달성함을 입증했습니다.

English

Answering complex, long-context questions remains a major challenge for large language models (LLMs) as it requires effective question clarifications and context retrieval. We propose Agentic Long-Context Understanding (AgenticLU), a framework designed to enhance an LLM's understanding of such queries by integrating targeted self-clarification with contextual grounding within an agentic workflow. At the core of AgenticLU is Chain-of-Clarifications (CoC), where models refine their understanding through self-generated clarification questions and corresponding contextual groundings. By scaling inference as a tree search where each node represents a CoC step, we achieve 97.8% answer recall on NarrativeQA with a search depth of up to three and a branching factor of eight. To amortize the high cost of this search process to training, we leverage the preference pairs for each step obtained by the CoC workflow and perform two-stage model finetuning: (1) supervised finetuning to learn effective decomposition strategies, and (2) direct preference optimization to enhance reasoning quality. This enables AgenticLU models to generate clarifications and retrieve relevant context effectively and efficiently in a single inference pass. Extensive experiments across seven long-context tasks demonstrate that AgenticLU significantly outperforms state-of-the-art prompting methods and specialized long-context LLMs, achieving robust multi-hop reasoning while sustaining consistent performance as context length grows.

자가 학습 기반 장기 문맥 이해

Self-Taught Agentic Long Context Understanding

초록

Support