迷失於噪音之中:推理模型如何在上下文干擾下失效
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
January 12, 2026
作者: Seongyun Lee, Yongrae Jo, Minju Seo, Moontae Lee, Minjoon Seo
cs.AI
摘要
近期推理模型與智能體AI系統的進展,使得系統對多元外部資訊的依賴度顯著提升。然而這種轉變引入了本質上帶有雜訊的輸入情境,這是當前經過淨化的基準測試所未能捕捉的現實。我們推出NoisyBench——一個全面性基準測試框架,系統化評估模型在RAG、推理、對齊及工具使用四大類任務中,針對隨機文檔、無關對話歷史、困難負樣本干擾等11種雜訊類型的魯棒性。實驗結果顯示,頂尖模型在面對情境干擾物時會出現災難性效能衰退,最高跌幅達80%。關鍵在於,我們發現智能體工作流常因過度信任含雜訊的工具輸出而放大錯誤,且干擾物即使非敵意設計也能觸發突發性失準現象。我們驗證提示工程、情境工程、監督微調及結果獎勵型強化學習均無法確保魯棒性;相比之下,我們提出的「理性感知獎勵機制」通過激勵模型識別雜訊中的有效資訊,顯著增強了系統韌性。最後,我們發現測試階段計算量增加會導致雜訊環境下的效能逆縮減現象,並透過注意力可視化證實模型會過度聚焦於干擾標記,這些發現為構建下一代具備魯棒性的推理智能體提供了關鍵洞見。
English
Recent advances in reasoning models and agentic AI systems have led to an increased reliance on diverse external information. However, this shift introduces input contexts that are inherently noisy, a reality that current sanitized benchmarks fail to capture. We introduce NoisyBench, a comprehensive benchmark that systematically evaluates model robustness across 11 datasets in RAG, reasoning, alignment, and tool-use tasks against diverse noise types, including random documents, irrelevant chat histories, and hard negative distractors. Our evaluation reveals a catastrophic performance drop of up to 80% in state-of-the-art models when faced with contextual distractors. Crucially, we find that agentic workflows often amplify these errors by over-trusting noisy tool outputs, and distractors can trigger emergent misalignment even without adversarial intent. We find that prompting, context engineering, SFT, and outcome-reward only RL fail to ensure robustness; in contrast, our proposed Rationale-Aware Reward (RARE) significantly strengthens resilience by incentivizing the identification of helpful information within noise. Finally, we uncover an inverse scaling trend where increased test-time computation leads to worse performance in noisy settings and demonstrate via attention visualization that models disproportionately focus on distractor tokens, providing vital insights for building the next generation of robust, reasoning-capable agents.