思考が事実と出会うとき：長文脈言語モデルのための再利用可能な推論

要旨

近年の長文脈言語モデル（LCLM）は、単一のプロンプトで数十万トークンを処理できるため、大量の検索文書を統合したり、場合によっては直接必要な情報をすべて取り込んだりすることで、知識集約型のマルチホップ推論に新たな可能性をもたらしています。しかし、単に文脈ウィンドウに多くの文書を投入するだけでは、証拠をどのようにつなぐべきかを捉えることができません。このギャップを埋めるために、我々は「思考テンプレート」を提案します。これは、推論を再利用可能な思考キャッシュとして再構築し、過去の問題解決の痕跡から導き出し、証拠の結合方法を構造化し、事実に基づく文書を用いたマルチホップ推論をガイドします。これらのテンプレートを効果的に保つために、自然言語フィードバックを通じてトレーニングデータから導出されたテンプレートを反復的に洗練する更新戦略を提案します。多様なベンチマークとLCLMファミリーにおいて、我々のアプローチは、検索ベースおよび検索不要の設定の両方で、強力なベースラインを一貫して上回る結果を示します。さらに、最適化されたテンプレートは、より小さなオープンソースモデルに蒸留できることを示し、その広範な適用性と透明な推論の再利用を実証します。我々はこのフレームワークを「思考テンプレート拡張LCLM（ToTAL）」と呼びます。

English

Recent Long-Context Language Models (LCLMs) can process hundreds of thousands of tokens in a single prompt, enabling new opportunities for knowledge-intensive multi-hop reasoning by integrating large sets of retrieved documents or, in some cases, directly all necessary information. However, simply feeding more documents into the context window fails to capture how evidence should be connected. We address this gap with thought templates, which recast reasoning as reusable thought caches, derived from prior problem solving traces, structuring how evidence is combined and guiding multi-hop inference with factual documents. To keep these templates effective, we propose an update strategy that iteratively refines templates derived from training data through natural-language feedback. Across diverse benchmarks and LCLM families, our approach delivers consistent gains over strong baselines in both retrieval-based and retrieval-free settings. Furthermore, we show that optimized templates can be distilled into smaller open-source models, demonstrating its broad applicability and transparent reasoning reuse. We refer to our framework as Thought Template Augmented LCLMs (ToTAL).

思考が事実と出会うとき：長文脈言語モデルのための再利用可能な推論

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

要旨

Support