ChatPaper.aiChatPaper

迷失于噪声之中:推理模型如何在上下文干扰下失效

Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

January 12, 2026
作者: Seongyun Lee, Yongrae Jo, Minju Seo, Moontae Lee, Minjoon Seo
cs.AI

摘要

近期推理模型与智能体AI系统的进展,使得系统对多样化外部信息的依赖日益加深。然而这种转变引入了本质上存在噪声的输入语境,而当前经过净化的基准测试未能捕捉这一现实。我们推出NoisyBench——一个综合性基准测试,通过在RAG、推理、对齐和工具使用四大类任务的11个数据集上系统评估模型对多种噪声的鲁棒性,包括随机文档、无关对话历史和困难负样本干扰项。评估表明,面对语境干扰项时,顶尖模型的性能会出现高达80%的灾难性下降。关键发现是:智能体工作流常因过度信任含噪工具输出而放大错误,且干扰项即便不带对抗意图也会触发突发性失准。我们发现提示工程、语境重构、监督微调和仅基于结果的强化学习均无法确保鲁棒性;相比之下,我们提出的"理性感知奖励机制"通过激励模型识别噪声中有用信息,显著增强了系统韧性。最后,我们揭示了测试时计算量增加反而导致噪声环境下性能下降的逆向缩放现象,并通过注意力可视化证明模型会过度关注干扰标记,这为构建下一代具备强推理能力的鲁棒智能体提供了重要洞见。
English
Recent advances in reasoning models and agentic AI systems have led to an increased reliance on diverse external information. However, this shift introduces input contexts that are inherently noisy, a reality that current sanitized benchmarks fail to capture. We introduce NoisyBench, a comprehensive benchmark that systematically evaluates model robustness across 11 datasets in RAG, reasoning, alignment, and tool-use tasks against diverse noise types, including random documents, irrelevant chat histories, and hard negative distractors. Our evaluation reveals a catastrophic performance drop of up to 80% in state-of-the-art models when faced with contextual distractors. Crucially, we find that agentic workflows often amplify these errors by over-trusting noisy tool outputs, and distractors can trigger emergent misalignment even without adversarial intent. We find that prompting, context engineering, SFT, and outcome-reward only RL fail to ensure robustness; in contrast, our proposed Rationale-Aware Reward (RARE) significantly strengthens resilience by incentivizing the identification of helpful information within noise. Finally, we uncover an inverse scaling trend where increased test-time computation leads to worse performance in noisy settings and demonstrate via attention visualization that models disproportionately focus on distractor tokens, providing vital insights for building the next generation of robust, reasoning-capable agents.
PDF323January 31, 2026