사실과 사고가 만날 때: 장문맥 언어 모델을 위한 재사용 가능한 추론

초록

최근의 장문맥 언어 모델(LCLMs)은 단일 프롬프트에서 수십만 개의 토큰을 처리할 수 있어, 대규모 검색 문서 집합을 통합하거나 경우에 따라 필요한 모든 정보를 직접 포함시킴으로써 지식 집약적 다중 단계 추론에 새로운 기회를 제공합니다. 그러나 단순히 더 많은 문서를 문맥 윈도우에 입력하는 것은 증거가 어떻게 연결되어야 하는지를 포착하지 못합니다. 우리는 이 격차를 해결하기 위해 사고 템플릿을 제안합니다. 이 템플릿은 이전 문제 해결 흔적에서 도출된 재사용 가능한 사고 캐시로 추론을 재구성하며, 증거가 어떻게 결합되는지를 구조화하고 사실적 문서를 통해 다중 단계 추론을 안내합니다. 이러한 템플릿의 효과를 유지하기 위해, 우리는 자연어 피드백을 통해 훈련 데이터에서 도출된 템플릿을 반복적으로 개선하는 업데이트 전략을 제안합니다. 다양한 벤치마크와 LCLM 패밀리에서, 우리의 접근 방식은 검색 기반 및 검색 없는 설정 모두에서 강력한 베이스라인 대비 일관된 성능 향상을 제공합니다. 또한, 최적화된 템플릿은 더 작은 오픈소스 모델로 증류될 수 있음을 보여주며, 이는 그 광범위한 적용 가능성과 투명한 추론 재사용을 입증합니다. 우리는 이 프레임워크를 "사고 템플릿 강화 LCLMs(ToTAL)"이라고 부릅니다.

English

Recent Long-Context Language Models (LCLMs) can process hundreds of thousands of tokens in a single prompt, enabling new opportunities for knowledge-intensive multi-hop reasoning by integrating large sets of retrieved documents or, in some cases, directly all necessary information. However, simply feeding more documents into the context window fails to capture how evidence should be connected. We address this gap with thought templates, which recast reasoning as reusable thought caches, derived from prior problem solving traces, structuring how evidence is combined and guiding multi-hop inference with factual documents. To keep these templates effective, we propose an update strategy that iteratively refines templates derived from training data through natural-language feedback. Across diverse benchmarks and LCLM families, our approach delivers consistent gains over strong baselines in both retrieval-based and retrieval-free settings. Furthermore, we show that optimized templates can be distilled into smaller open-source models, demonstrating its broad applicability and transparent reasoning reuse. We refer to our framework as Thought Template Augmented LCLMs (ToTAL).

사실과 사고가 만날 때: 장문맥 언어 모델을 위한 재사용 가능한 추론

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

초록

Support