增加推理时计算量真能提升模型鲁棒性吗？

摘要

近期，Zaremba等人证实，在大型专有推理大语言模型（LLMs）中，增加推理时的计算量能提升模型的鲁棒性。本文首先表明，较小规模的开源模型（如DeepSeek R1、Qwen3、Phi-reasoning）通过采用简单的预算强制策略，同样能从推理时扩展中获益。更重要的是，我们揭示并深入探讨了先前研究中的一个隐含假设：中间推理步骤对攻击者是不可见的。通过放宽这一假设，我们发现了一个重要的安全隐患，这一发现既基于直观推理又通过实验验证，表现为一种逆向缩放规律：若中间推理步骤变得明确可访问，增加推理时的计算量反而会持续削弱模型的鲁棒性。最后，我们探讨了即便推理链被隐藏，模型仍易受攻击的实际场景，例如集成了工具推理的模型及高级推理提取攻击。我们的研究共同表明，推理时扩展带来的鲁棒性提升高度依赖于对抗环境与部署情境。我们强烈建议实践者在安全敏感的实际应用中采用推理时扩展前，需仔细权衡这些微妙的利弊关系。

English

Recently, Zaremba et al. demonstrated that increasing inference-time computation improves robustness in large proprietary reasoning LLMs. In this paper, we first show that smaller-scale, open-source models (e.g., DeepSeek R1, Qwen3, Phi-reasoning) can also benefit from inference-time scaling using a simple budget forcing strategy. More importantly, we reveal and critically examine an implicit assumption in prior work: intermediate reasoning steps are hidden from adversaries. By relaxing this assumption, we identify an important security risk, intuitively motivated and empirically verified as an inverse scaling law: if intermediate reasoning steps become explicitly accessible, increased inference-time computation consistently reduces model robustness. Finally, we discuss practical scenarios where models with hidden reasoning chains are still vulnerable to attacks, such as models with tool-integrated reasoning and advanced reasoning extraction attacks. Our findings collectively demonstrate that the robustness benefits of inference-time scaling depend heavily on the adversarial setting and deployment context. We urge practitioners to carefully weigh these subtle trade-offs before applying inference-time scaling in security-sensitive, real-world applications.

增加推理时计算量真能提升模型鲁棒性吗？

Does More Inference-Time Compute Really Help Robustness?

摘要

Support