LLMs 改進 LLMs：測試時擴展的代理式發現

摘要

測試時擴展（TTS）已成為一種透過在推理階段分配更多計算資源來提升大型語言模型效能的有效方法。然而，現有的 TTS 策略大多依賴人工設計：研究人員手動設計推理模式並根據直覺調整啟發式規則，導致大量計算分配空間未被探索。我們提出一個環境驅動框架 AutoTTS，它改變了研究人員的設計對象：從單一的 TTS 啟發式規則轉向能夠自動發現 TTS 策略的環境。AutoTTS 的關鍵在於環境建構：發現環境必須使控制空間易於處理，並為 TTS 搜索提供廉價且頻繁的反饋。作為具體實例，我們將寬度-深度 TTS 表述為基於預先收集的推理軌跡與探測信號的控制器合成，其中控制器決定何時分支、繼續、探測、剪枝或停止，並且能夠在不重複呼叫 LLM 的情況下進行廉價評估。我們進一步引入貝塔參數化來使搜索易於處理，並引入細粒度執行軌跡反饋來幫助智能體診斷 TTS 程式失敗的原因，從而提升發現效率。在數學推理基準上的實驗表明，所發現的策略在整體準確率-成本權衡上優於強人工設計的基準方法。這些發現的策略能泛化至未見的基準與模型規模，而整個發現過程僅花費 39.9 美元與 160 分鐘。我們的數據與程式碼將在 https://github.com/zhengkid/AutoTTS 開源。

English

Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments where TTS strategies can be discovered automatically. The key to AutoTTS lies in environment construction: the discovery environment must make the control space tractable and provide cheap, frequent feedback for TTS search. As a concrete instantiation, we formulate width--depth TTS as controller synthesis over pre-collected reasoning trajectories and probe signals, where controllers decide when to branch, continue, probe, prune, or stop and can be evaluated cheaply without repeated LLM calls. We further introduce beta parameterization to make the search tractable and fine-grained execution trace feedback to improve discovery efficiency by helping the agent diagnose why a TTS program fails. Experiments on mathematical reasoning benchmarks show that the discovered strategies improve the overall accuracy--cost tradeoff over strong manually designed baselines. The discovered strategies generalize to held-out benchmarks and model scales, while the entire discovery costs only $39.9 and 160 minutes. Our data, and code will be open-source at https://github.com/zhengkid/AutoTTS.

LLMs 改進 LLMs：測試時擴展的代理式發現

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

摘要

Support