推論時の計算量の増加は、本当にロバスト性の向上に寄与するのか？

要旨

最近、Zarembaらは、推論時の計算量を増やすことが大規模な専有推論LLMの頑健性を向上させることを実証した。本論文ではまず、小規模なオープンソースモデル（例：DeepSeek R1、Qwen3、Phi-reasoning）も、単純な予算強制戦略を用いることで推論時のスケーリングの恩恵を受けられることを示す。さらに重要なことに、我々は先行研究における暗黙の仮定、すなわち中間推論ステップが敵対者から隠されているという前提を明らかにし、批判的に検証する。この仮定を緩和することで、重要なセキュリティリスクを特定し、直感的に動機付けられ、経験的に検証された逆スケーリング則を明らかにする：中間推論ステップが明示的にアクセス可能になると、推論時の計算量の増加は一貫してモデルの頑健性を低下させる。最後に、推論チェーンが隠されているモデルでも、ツール統合型推論や高度な推論抽出攻撃などの攻撃に対して脆弱である実用的なシナリオについて議論する。我々の知見は、推論時のスケーリングによる頑健性の向上は、敵対的設定と展開コンテキストに大きく依存することを示している。セキュリティが重要な実世界のアプリケーションに推論時のスケーリングを適用する前に、これらの微妙なトレードオフを慎重に検討するよう実践者に強く促す。

English

Recently, Zaremba et al. demonstrated that increasing inference-time computation improves robustness in large proprietary reasoning LLMs. In this paper, we first show that smaller-scale, open-source models (e.g., DeepSeek R1, Qwen3, Phi-reasoning) can also benefit from inference-time scaling using a simple budget forcing strategy. More importantly, we reveal and critically examine an implicit assumption in prior work: intermediate reasoning steps are hidden from adversaries. By relaxing this assumption, we identify an important security risk, intuitively motivated and empirically verified as an inverse scaling law: if intermediate reasoning steps become explicitly accessible, increased inference-time computation consistently reduces model robustness. Finally, we discuss practical scenarios where models with hidden reasoning chains are still vulnerable to attacks, such as models with tool-integrated reasoning and advanced reasoning extraction attacks. Our findings collectively demonstrate that the robustness benefits of inference-time scaling depend heavily on the adversarial setting and deployment context. We urge practitioners to carefully weigh these subtle trade-offs before applying inference-time scaling in security-sensitive, real-world applications.

推論時の計算量の増加は、本当にロバスト性の向上に寄与するのか？

Does More Inference-Time Compute Really Help Robustness?

要旨

Support