ChatPaper.aiChatPaper

LLMs改进LLMs:测试时扩展的智能体发现

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

May 8, 2026
作者: Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang, Rui Liu, Runpeng Dai, Ruibo Chen, Chenxi Liu, Tianyi Xiong, Xidong Wu, Hongming Zhang, Heng Huang
cs.AI

摘要

测试时扩展(TTS)已成为一种通过在推理阶段分配额外计算来提升大语言模型性能的有效方法。然而,现有TTS策略大多依赖人工设计:研究人员通过直觉手动设计推理模式并调整启发式规则,导致计算分配空间存在大量未探索区域。我们提出了基于环境驱动的框架AutoTTS,该框架转变了研究者的设计对象:从设计单个TTS启发式策略转向构建能够自动发现TTS策略的环境。AutoTTS的核心在于环境构建:发现环境必须使控制空间易于处理,并为TTS搜索提供低成本、高频次的反馈。作为具体实现,我们将宽度-深度TTS形式化为基于预收集推理轨迹和探针信号的控制器综合问题——控制器需决策何时分支、继续、探测、剪枝或终止,并能避免重复大语言模型调用而进行低成本评估。我们进一步引入β参数化以提升搜索的可处理性,并通过细粒度执行轨迹反馈帮助智能体诊断TTS程序失败原因,从而提高发现效率。在数学推理基准上的实验表明,所发现的策略在精度-成本权衡上优于强人工设计基线。这些策略可泛化至未见基准和不同模型规模,而整个发现过程仅需39.9美元和160分钟。我们的数据和代码将在https://github.com/zhengkid/AutoTTS开源。
English
Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments where TTS strategies can be discovered automatically. The key to AutoTTS lies in environment construction: the discovery environment must make the control space tractable and provide cheap, frequent feedback for TTS search. As a concrete instantiation, we formulate width--depth TTS as controller synthesis over pre-collected reasoning trajectories and probe signals, where controllers decide when to branch, continue, probe, prune, or stop and can be evaluated cheaply without repeated LLM calls. We further introduce beta parameterization to make the search tractable and fine-grained execution trace feedback to improve discovery efficiency by helping the agent diagnose why a TTS program fails. Experiments on mathematical reasoning benchmarks show that the discovered strategies improve the overall accuracy--cost tradeoff over strong manually designed baselines. The discovered strategies generalize to held-out benchmarks and model scales, while the entire discovery costs only $39.9 and 160 minutes. Our data, and code will be open-source at https://github.com/zhengkid/AutoTTS.
PDF531May 12, 2026