更多的推理時間計算真的能提升魯棒性嗎？

摘要

近期，Zaremba等人展示了增加推理時計算能提升大型專有推理LLM的穩健性。本文中，我們首先證明，較小規模的開源模型（如DeepSeek R1、Qwen3、Phi-reasoning）通過採用簡單的預算強制策略，也能從推理時擴展中獲益。更重要的是，我們揭示並批判性地檢驗了先前研究中的一個隱含假設：中間推理步驟對攻擊者而言是隱藏的。通過放寬這一假設，我們發現了一個重要的安全風險，直觀上並經實證驗證為一種逆向擴展定律：若中間推理步驟變得顯式可訪問，增加推理時計算會持續降低模型的穩健性。最後，我們探討了在實際場景中，即使推理鏈隱藏，模型仍易受攻擊的情況，例如集成工具推理的模型及高級推理提取攻擊。我們的研究結果共同表明，推理時擴展帶來的穩健性優勢高度依賴於對抗環境和部署情境。我們敦促實踐者在安全敏感的現實應用中應用推理時擴展前，仔細權衡這些微妙的取捨。

English

Recently, Zaremba et al. demonstrated that increasing inference-time computation improves robustness in large proprietary reasoning LLMs. In this paper, we first show that smaller-scale, open-source models (e.g., DeepSeek R1, Qwen3, Phi-reasoning) can also benefit from inference-time scaling using a simple budget forcing strategy. More importantly, we reveal and critically examine an implicit assumption in prior work: intermediate reasoning steps are hidden from adversaries. By relaxing this assumption, we identify an important security risk, intuitively motivated and empirically verified as an inverse scaling law: if intermediate reasoning steps become explicitly accessible, increased inference-time computation consistently reduces model robustness. Finally, we discuss practical scenarios where models with hidden reasoning chains are still vulnerable to attacks, such as models with tool-integrated reasoning and advanced reasoning extraction attacks. Our findings collectively demonstrate that the robustness benefits of inference-time scaling depend heavily on the adversarial setting and deployment context. We urge practitioners to carefully weigh these subtle trade-offs before applying inference-time scaling in security-sensitive, real-world applications.

更多的推理時間計算真的能提升魯棒性嗎？

Does More Inference-Time Compute Really Help Robustness?

摘要

Support