凍結された大規模言語モデルのための学習証拠の強調

要旨

大規模言語モデル（LLMs）は優れた推論能力を持つが、長くノイズの多い文脈に埋もれた決定的な証拠を見落とすことが多い。本論文では、パラメータ固定のLLMソルバー向けに、証拠選択と推論を分離する証拠強調フレームワーク「HiLight」を提案する。HiLightは、証拠を破棄または歪める可能性のある入力の圧縮や再構成を避け、軽量な強調アクターを訓練して、変更されていない文脈内の重要なスパンに最小限のハイライトタグを付与する。その後、パラメータ固定のソルバーが強調された入力に対して下流の推論を実行する。ハイライト処理を弱教師あり意思決定問題として定式化し、証拠ラベルやソルバーへのアクセス・修正を必要とせず、ソルバーのタスク報酬のみを用いた強化学習によってアクターを最適化する。逐次推薦と長文質問応答のタスクにおいて、HiLightは強力なプロンプトベース及び自動プロンプト最適化のベースラインを一貫して上回る性能を示した。学習された強調ポリシーは、より小規模または大規模な未知のソルバーファミリー（APIベースのソルバーを含む）に対してもゼロショット転移が可能であり、アクターが単一のバックボーンへの過剰適合ではなく、真に再利用可能な証拠構造を獲得していることが示唆される。

English

Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or rewriting the input, which can discard or distort evidence, by training a lightweight Emphasis Actor to insert minimal highlight tags around pivotal spans in the unaltered context. A frozen Solver then performs downstream reasoning on the emphasized input. We cast highlighting as a weakly supervised decision-making problem and optimize the Actor with reinforcement learning using only the Solver's task reward, requiring no evidence labels and no access to or modification of the Solver. Across sequential recommendation and long-context question answering, HiLight consistently improves performance over strong prompt-based and automated prompt-optimization baselines. The learned emphasis policy transfers zero-shot to both smaller and larger unseen Solver families, including an API-based Solver, suggesting that the Actor captures genuine, reusable evidence structure rather than overfitting to a single backbone.

凍結された大規模言語モデルのための学習証拠の強調

Learning Evidence Highlighting for Frozen LLMs

要旨

Support