ChatPaper.aiChatPaper

ThinkDial:控制大型语言模型推理力度的开放方案

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

August 26, 2025
作者: Qianyu He, Siyu Yuan, Xuefeng Li, Mingxuan Wang, Jiangjie Chen
cs.AI

摘要

具备链式思维推理能力的大型语言模型(LLMs)已展现出卓越的问题解决能力,但在实际部署中,控制其计算开销仍是一大挑战。近期,如OpenAI的gpt-oss系列等专有系统引入了离散操作模式以实现直观的推理控制,然而开源社区大多未能实现类似功能。本文中,我们提出了ThinkDial,这是首个开源端到端框架,成功通过离散操作模式实现了gpt-oss风格的可控推理。我们的系统能够在三种不同的推理模式间无缝切换:高性能模式(全推理能力)、中等模式(减少50%的token使用,性能下降小于10%)和低性能模式(减少75%的token使用,性能下降小于15%)。这一成果得益于我们采用的端到端训练范式,该范式将预算模式控制整合至整个流程中:包括嵌入可控推理能力的预算模式监督微调,以及采用自适应奖励塑形的两阶段预算感知强化学习。大量实验表明,ThinkDial在保持性能阈值的同时,实现了目标压缩与性能间的权衡,并显著缩短了响应长度。此外,该框架在分布外任务上也展现了强大的泛化能力。
English
Large language models (LLMs) with chain-of-thought reasoning have demonstrated remarkable problem-solving capabilities, but controlling their computational effort remains a significant challenge for practical deployment. Recent proprietary systems like OpenAI's gpt-oss series have introduced discrete operational modes for intuitive reasoning control, but the open-source community has largely failed to achieve such capabilities. In this paper, we introduce ThinkDial, the first open-recipe end-to-end framework that successfully implements gpt-oss-style controllable reasoning through discrete operational modes. Our system enables seamless switching between three distinct reasoning regimes: High mode (full reasoning capability), Medium mode (50 percent token reduction with <10 percent performance degradation), and Low mode (75 percent token reduction with <15 percent performance degradation). We achieve this through an end-to-end training paradigm that integrates budget-mode control throughout the entire pipeline: budget-mode supervised fine-tuning that embeds controllable reasoning capabilities directly into the learning process, and two-phase budget-aware reinforcement learning with adaptive reward shaping. Extensive experiments demonstrate that ThinkDial achieves target compression-performance trade-offs with clear response length reductions while maintaining performance thresholds. The framework also exhibits strong generalization capabilities on out-of-distribution tasks.
PDF153August 27, 2025