ChatPaper.aiChatPaper

增加推理时计算量真能提升模型鲁棒性吗?

Does More Inference-Time Compute Really Help Robustness?

July 21, 2025
作者: Tong Wu, Chong Xiang, Jiachen T. Wang, Weichen Yu, Chawin Sitawarin, Vikash Sehwag, Prateek Mittal
cs.AI

摘要

近期,Zaremba等人证实,在大型专有推理大语言模型(LLMs)中,增加推理时的计算量能提升模型的鲁棒性。本文首先表明,较小规模的开源模型(如DeepSeek R1、Qwen3、Phi-reasoning)通过采用简单的预算强制策略,同样能从推理时扩展中获益。更重要的是,我们揭示并深入探讨了先前研究中的一个隐含假设:中间推理步骤对攻击者是不可见的。通过放宽这一假设,我们发现了一个重要的安全隐患,这一发现既基于直观推理又通过实验验证,表现为一种逆向缩放规律:若中间推理步骤变得明确可访问,增加推理时的计算量反而会持续削弱模型的鲁棒性。最后,我们探讨了即便推理链被隐藏,模型仍易受攻击的实际场景,例如集成了工具推理的模型及高级推理提取攻击。我们的研究共同表明,推理时扩展带来的鲁棒性提升高度依赖于对抗环境与部署情境。我们强烈建议实践者在安全敏感的实际应用中采用推理时扩展前,需仔细权衡这些微妙的利弊关系。
English
Recently, Zaremba et al. demonstrated that increasing inference-time computation improves robustness in large proprietary reasoning LLMs. In this paper, we first show that smaller-scale, open-source models (e.g., DeepSeek R1, Qwen3, Phi-reasoning) can also benefit from inference-time scaling using a simple budget forcing strategy. More importantly, we reveal and critically examine an implicit assumption in prior work: intermediate reasoning steps are hidden from adversaries. By relaxing this assumption, we identify an important security risk, intuitively motivated and empirically verified as an inverse scaling law: if intermediate reasoning steps become explicitly accessible, increased inference-time computation consistently reduces model robustness. Finally, we discuss practical scenarios where models with hidden reasoning chains are still vulnerable to attacks, such as models with tool-integrated reasoning and advanced reasoning extraction attacks. Our findings collectively demonstrate that the robustness benefits of inference-time scaling depend heavily on the adversarial setting and deployment context. We urge practitioners to carefully weigh these subtle trade-offs before applying inference-time scaling in security-sensitive, real-world applications.
PDF61July 23, 2025