冷冻大语言模型的学习证据突显方法（注：此处"Frozen LLMs"译为"冷冻大语言模型"，指参数被冻结不参与训练的LLMs；"Learning Evidence Highlighting"意译为"学习证据突显方法"，强调通过可视化技术展示模型内部知识表征的学术表达）

摘要

大型语言模型虽具备较强的推理能力，但在处理冗长嘈杂的上下文时常常遗漏关键证据。我们提出HiLight——一种证据强调框架，该框架将证据选择与推理过程解耦，适用于无需微调的LLM求解器。HiLight通过训练轻量级强调执行器，在保持原始上下文不变的前提下对关键片段插入最小化的高亮标记，从而避免因压缩或重写输入导致的证据丢失或失真。随后，冻结的求解器可基于强调后的输入进行下游推理。我们将高亮标记定义为弱监督决策问题，仅利用求解器的任务奖励通过强化学习优化执行器，无需证据标签且不修改求解器内部参数。在序列推荐和长上下文问答任务上的实验表明，HiLight持续优于基于提示的基线方法和自动提示优化方法。习得的强调策略可零样本迁移至不同规模的未知求解器家族（包括基于API的求解器），表明该执行器捕捉到了真实可复用的证据结构，而非对单一骨干网络的过拟合。

English

Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or rewriting the input, which can discard or distort evidence, by training a lightweight Emphasis Actor to insert minimal highlight tags around pivotal spans in the unaltered context. A frozen Solver then performs downstream reasoning on the emphasized input. We cast highlighting as a weakly supervised decision-making problem and optimize the Actor with reinforcement learning using only the Solver's task reward, requiring no evidence labels and no access to or modification of the Solver. Across sequential recommendation and long-context question answering, HiLight consistently improves performance over strong prompt-based and automated prompt-optimization baselines. The learned emphasis policy transfers zero-shot to both smaller and larger unseen Solver families, including an API-based Solver, suggesting that the Actor captures genuine, reusable evidence structure rather than overfitting to a single backbone.

冷冻大语言模型的学习证据突显方法（注：此处"Frozen LLMs"译为"冷冻大语言模型"，指参数被冻结不参与训练的LLMs；"Learning Evidence Highlighting"意译为"学习证据突显方法"，强调通过可视化技术展示模型内部知识表征的学术表达）

Learning Evidence Highlighting for Frozen LLMs

摘要

Support