ChatPaper.aiChatPaper

仅在需要时思考:大型混合推理模型的应用

Think Only When You Need with Large Hybrid-Reasoning Models

May 20, 2025
作者: Lingjie Jiang, Xun Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi, Li Dong, Xingxing Zhang, Tengchao Lv, Lei Cui, Furu Wei
cs.AI

摘要

近期的大型推理模型(LRMs)通过在生产最终响应前引入扩展的思维过程,相较于传统的大型语言模型(LLMs)展现出了显著提升的推理能力。然而,过长的思维过程会带来大量的令牌消耗和延迟开销,这对于简单查询而言尤为不必要。在本研究中,我们提出了大型混合推理模型(LHRMs),这是首类能够根据用户查询的上下文信息自适应决定是否执行思维过程的模型。为实现这一目标,我们设计了一个两阶段训练流程:首先采用混合微调(HFT)作为冷启动,随后通过提出的混合群体策略优化(HGPO)进行在线强化学习,以隐式学习选择适当的思维模式。此外,我们引入了一种名为混合准确率的指标,用于定量评估模型的混合思维能力。大量实验结果表明,LHRMs能够针对不同难度和类型的查询自适应地执行混合思维,在推理和通用能力上均优于现有的LRMs和LLMs,同时显著提升了效率。我们的工作共同倡导重新审视扩展思维过程的适当使用,并为构建混合思维系统提供了坚实的起点。
English
Recent Large Reasoning Models (LRMs) have shown substantially improved reasoning capabilities over traditional Large Language Models (LLMs) by incorporating extended thinking processes prior to producing final responses. However, excessively lengthy thinking introduces substantial overhead in terms of token consumption and latency, which is particularly unnecessary for simple queries. In this work, we introduce Large Hybrid-Reasoning Models (LHRMs), the first kind of model capable of adaptively determining whether to perform thinking based on the contextual information of user queries. To achieve this, we propose a two-stage training pipeline comprising Hybrid Fine-Tuning (HFT) as a cold start, followed by online reinforcement learning with the proposed Hybrid Group Policy Optimization (HGPO) to implicitly learn to select the appropriate thinking mode. Furthermore, we introduce a metric called Hybrid Accuracy to quantitatively assess the model's capability for hybrid thinking. Extensive experimental results show that LHRMs can adaptively perform hybrid thinking on queries of varying difficulty and type. It outperforms existing LRMs and LLMs in reasoning and general capabilities while significantly improving efficiency. Together, our work advocates for a reconsideration of the appropriate use of extended thinking processes and provides a solid starting point for building hybrid thinking systems.

Summary

AI-Generated Summary

PDF111May 21, 2025