ChatPaper.aiChatPaper

AgentTTS:面向复杂任务测试时计算最优扩展策略的大语言模型代理

AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

July 26, 2025
作者: Fali Wang, Hui Liu, Zhenwei Dai, Jingying Zeng, Zhiwei Zhang, Zongyu Wu, Chen Luo, Zhen Li, Xianfeng Tang, Qi He, Suhang Wang
cs.AI

摘要

测试时扩展(TTS)通过为推理阶段分配额外的计算资源,提升了大型语言模型(LLMs)的性能。然而,现有研究主要探讨了单阶段任务中的TTS;而现实世界中的许多问题属于多阶段复杂任务,由一系列异质子任务构成,每个子任务都需要具备特定能力的LLM。因此,我们研究了一个新颖问题:多阶段复杂任务中的测试时计算最优扩展,旨在为每个子任务选择合适的模型并分配预算,以最大化整体性能。多阶段任务中的TTS引入了两大基础挑战:(i)模型与预算分配的组合搜索空间,加之推理的高昂成本,使得暴力搜索不切实际。(ii)各子任务间最优模型与预算分配相互依赖,增加了计算最优搜索的复杂性。为填补这一空白,我们在六个数据集上的四项任务中开展了广泛的先导实验,得出了三条经验性见解,刻画了LLMs在多阶段复杂任务中的行为特征。基于这些见解,我们提出了AgentTTS,一个基于LLM代理的框架,它通过与执行环境的迭代反馈驱动交互,自主搜索计算最优分配。实验结果表明,AgentTTS在搜索效率上显著优于传统及其他基于LLM的基线方法,并在面对不同训练集规模时展现出更强的鲁棒性,同时提高了可解释性。
English
Test-time scaling (TTS) enhances the performance of large language models (LLMs) by allocating additional compute resources during inference. However, existing research primarily investigates TTS in single-stage tasks; while many real-world problems are multi-stage complex tasks, composed of a sequence of heterogeneous subtasks with each subtask requires LLM of specific capability. Therefore, we study a novel problem: the test-time compute-optimal scaling in multi-stage complex tasks, aiming to select suitable models and allocate budgets per subtask to maximize overall performance. TTS in multi-stage tasks introduces two fundamental challenges: (i) The combinatorial search space of model and budget allocations, combined with the high cost of inference, makes brute-force search impractical. (ii) The optimal model and budget allocations across subtasks are interdependent, increasing the complexity of the compute-optimal search. To address this gap, we conduct extensive pilot experiments on four tasks across six datasets, deriving three empirical insights characterizing the behavior of LLMs in multi-stage complex tasks. Informed by these insights, we propose AgentTTS, an LLM-agent-based framework that autonomously searches for compute-optimal allocations through iterative feedback-driven interactions with the execution environment. Experimental results demonstrate that AgentTTS significantly outperforms traditional and other LLM-based baselines in search efficiency, and shows improved robustness to varying training set sizes and enhanced interpretability.
PDF42August 5, 2025