ChatPaper.aiChatPaper

動力學:重新思考測試階段的尺度定律

Kinetics: Rethinking Test-Time Scaling Laws

June 5, 2025
作者: Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen
cs.AI

摘要

我們從實際效率的角度重新審視了測試時的縮放定律,揭示了較小模型的有效性被顯著高估。先前基於計算最優性的研究忽略了推理策略(如Best-of-N、長鏈思維)引入的關鍵內存訪問瓶頸。我們對參數量從0.6B到32B的模型進行了全面分析,提出了一種新的動力學縮放定律,該定律通過結合計算和內存訪問成本,更好地指導資源分配。動力學縮放定律表明,測試時的計算資源在應用於超過某一閾值的模型時,比用於較小模型更為有效。一個關鍵原因在於,在測試時,注意力而非參數量成為主導成本因素。基於此,我們提出了一種以稀疏注意力為核心的新縮放範式,它降低了每個令牌的成本,並在相同的資源預算內實現了更長的生成和更多的並行樣本。實證表明,稀疏注意力模型在低成本區域始終優於密集模型,在AIME問題解決準確率上取得了超過60分的提升,在高成本區域也取得了超過5分的提升,涵蓋了對最新混合專家模型的評估。這些結果表明,稀疏注意力對於充分發揮測試時縮放的潛力至關重要,因為與訓練時參數縮放趨於飽和不同,測試時的準確率通過增加生成持續提升。代碼可在https://github.com/Infini-AI-Lab/Kinetics獲取。
English
We rethink test-time scaling laws from a practical efficiency perspective, revealing that the effectiveness of smaller models is significantly overestimated. Prior work, grounded in compute-optimality, overlooks critical memory access bottlenecks introduced by inference-time strategies (e.g., Best-of-N, long CoTs). Our holistic analysis, spanning models from 0.6B to 32B parameters, reveals a new Kinetics Scaling Law that better guides resource allocation by incorporating both computation and memory access costs. Kinetics Scaling Law suggests that test-time compute is more effective when used on models above a threshold than smaller ones. A key reason is that in TTS, attention, rather than parameter count, emerges as the dominant cost factor. Motivated by this, we propose a new scaling paradigm centered on sparse attention, which lowers per-token cost and enables longer generations and more parallel samples within the same resource budget. Empirically, we show that sparse attention models consistently outperform dense counterparts, achieving over 60 points gains in low-cost regimes and over 5 points gains in high-cost regimes for problem-solving accuracy on AIME, encompassing evaluations on state-of-the-art MoEs. These results suggest that sparse attention is essential for realizing the full potential of test-time scaling because, unlike training, where parameter scaling saturates, test-time accuracy continues to improve through increased generation. The code is available at https://github.com/Infini-AI-Lab/Kinetics.
PDF61June 6, 2025