ChatPaper.aiChatPaper

何為、如何、何處及成效如何?大型語言模型中測試時縮放技術的全面調查

What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

March 31, 2025
作者: Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Zhihan Guo, Yufei Wang, Irwin King, Xue Liu, Chen Ma
cs.AI

摘要

随着预训练时代对计算规模(数据和参数)的热情逐渐消退,测试时扩展(Test-Time Scaling, TTS),亦称“测试时计算”,已成为一个显著的研究焦点。近期研究表明,TTS能够进一步激发大型语言模型(LLMs)的解题能力,不仅在数学和编程等专门推理任务上取得重大突破,也在开放式问答等通用任务中表现卓越。然而,尽管该领域近期研究激增,仍亟需一份全面综述以提供系统性理解。为填补这一空白,我们提出了一个统一的多维度框架,围绕TTS研究的四个核心维度构建:扩展什么、如何扩展、在何处扩展以及扩展效果如何。基于此分类体系,我们对方法、应用场景及评估方面进行了广泛回顾,并呈现了一种有序的分解,凸显了各项技术在更广阔TTS领域中的独特功能角色。通过这一分析,我们提炼了迄今为止TTS的主要发展轨迹,并提供了实际部署的实用指南。此外,我们识别了若干开放挑战,并对未来研究方向提出了见解,包括进一步扩展、澄清技术功能本质、推广至更多任务以及更多归因分析。
English
As enthusiasm for scaling computation (data and parameters) in the pretraining era gradually diminished, test-time scaling (TTS), also referred to as ``test-time computing'' has emerged as a prominent research focus. Recent studies demonstrate that TTS can further elicit the problem-solving capabilities of large language models (LLMs), enabling significant breakthroughs not only in specialized reasoning tasks, such as mathematics and coding, but also in general tasks like open-ended Q&A. However, despite the explosion of recent efforts in this area, there remains an urgent need for a comprehensive survey offering a systemic understanding. To fill this gap, we propose a unified, multidimensional framework structured along four core dimensions of TTS research: what to scale, how to scale, where to scale, and how well to scale. Building upon this taxonomy, we conduct an extensive review of methods, application scenarios, and assessment aspects, and present an organized decomposition that highlights the unique functional roles of individual techniques within the broader TTS landscape. From this analysis, we distill the major developmental trajectories of TTS to date and offer hands-on guidelines for practical deployment. Furthermore, we identify several open challenges and offer insights into promising future directions, including further scaling, clarifying the functional essence of techniques, generalizing to more tasks, and more attributions.

Summary

AI-Generated Summary

PDF532April 1, 2025