重新思考計算效率最佳化之測試階段縮放驗證粒度

摘要

測試時縮放（TTS）已被證明能有效提升大型語言模型（LLMs）的推理能力。驗證在TTS中扮演著關鍵角色，同時影響（1）推理效能和（2）計算效率，這取決於驗證的品質和計算成本。在本研究中，我們挑戰了傳統的驗證範式，並首次嘗試系統性地探討驗證粒度——即驗證器在生成過程中被調用的頻率，而非僅驗證最終輸出或單個生成步驟——的影響。為此，我們引入了可變粒度搜索（VG-Search），這是一種通過可調粒度參數g來泛化束搜索和最佳N採樣的統一算法。在不同計算預算、生成器-驗證器配置及任務屬性下，VG-Search的廣泛實驗表明，動態選擇g能夠提升計算效率和縮放行為。基於這些發現，我們提出了自適應VG-Search策略，相比束搜索和最佳N採樣，分別實現了最高3.1%和3.6%的準確率提升，同時將浮點運算次數（FLOPs）減少了超過52%。我們將開源代碼以支持未來研究。

English

Test-time scaling (TTS) has proven effective in enhancing the reasoning capabilities of large language models (LLMs). Verification plays a key role in TTS, simultaneously influencing (1) reasoning performance and (2) compute efficiency, due to the quality and computational cost of verification. In this work, we challenge the conventional paradigms of verification, and make the first attempt toward systematically investigating the impact of verification granularity-that is, how frequently the verifier is invoked during generation, beyond verifying only the final output or individual generation steps. To this end, we introduce Variable Granularity Search (VG-Search), a unified algorithm that generalizes beam search and Best-of-N sampling via a tunable granularity parameter g. Extensive experiments with VG-Search under varying compute budgets, generator-verifier configurations, and task attributes reveal that dynamically selecting g can improve the compute efficiency and scaling behavior. Building on these findings, we propose adaptive VG-Search strategies that achieve accuracy gains of up to 3.1\% over Beam Search and 3.6\% over Best-of-N, while reducing FLOPs by over 52\%. We will open-source the code to support future research.

重新思考計算效率最佳化之測試階段縮放驗證粒度

Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling

摘要

Support