LLMsによるLLMsの改善：テスト時スケーリングのためのエージェント的発見

要旨

テスト時スケーリング（TTS）は、推論時に追加の計算リソースを割り当てることで大規模言語モデルの性能を向上させる効果的な手法となっている。しかし、既存のTTS戦略はほとんどが手作業であり、研究者が直感に基づいて推論パターンを手動で設計し、ヒューリスティクスを調整しているため、計算割り当て空間の多くは未探索のままである。本研究では、環境駆動型フレームワークであるAutoTTSを提案する。これは研究者の設計対象を個別のTTSヒューリスティクスから、TTS戦略を自動的に発見できる環境へと変更するものである。AutoTTSの鍵は環境構築にある。すなわち、発見環境は制御空間を扱いやすくし、TTS探索に対して安価で頻繁なフィードバックを提供しなければならない。具体的な実装として、幅－深さTTSを、事前収集した推論軌跡とプローブ信号に基づくコントローラ合成として定式化する。このコントローラは、分岐、継続、プローブ、枝刈り、停止のタイミングを決定し、LLM呼び出しを繰り返さずに安価に評価できる。さらに、探索を扱いやすくするためにベータパラメータ化を導入し、きめ細かい実行トレースフィードバックを用いて、エージェントがTTSプログラムの失敗原因を診断できるようにすることで発見効率を向上させる。数学的推論ベンチマークでの実験により、発見された戦略は、手作業で設計された強力なベースラインと比較して、精度とコストのトレードオフを改善することが示された。発見された戦略は、未見のベンチマークやモデル規模にも一般化し、発見全体のコストはわずか39.9ドルと160分である。データとコードはhttps://github.com/zhengkid/AutoTTS で公開する。

English

Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments where TTS strategies can be discovered automatically. The key to AutoTTS lies in environment construction: the discovery environment must make the control space tractable and provide cheap, frequent feedback for TTS search. As a concrete instantiation, we formulate width--depth TTS as controller synthesis over pre-collected reasoning trajectories and probe signals, where controllers decide when to branch, continue, probe, prune, or stop and can be evaluated cheaply without repeated LLM calls. We further introduce beta parameterization to make the search tractable and fine-grained execution trace feedback to improve discovery efficiency by helping the agent diagnose why a TTS program fails. Experiments on mathematical reasoning benchmarks show that the discovered strategies improve the overall accuracy--cost tradeoff over strong manually designed baselines. The discovered strategies generalize to held-out benchmarks and model scales, while the entire discovery costs only $39.9 and 160 minutes. Our data, and code will be open-source at https://github.com/zhengkid/AutoTTS.

LLMsによるLLMsの改善：テスト時スケーリングのためのエージェント的発見

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

要旨

Support