ChatPaper.aiChatPaper

仅需两位专家即可引导思维:无需额外训练即可增强MoE推理模型中的认知努力

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

May 20, 2025
作者: Mengru Wang, Xingyu Chen, Yue Wang, Zhiwei He, Jiahao Xu, Tian Liang, Qiuzhi Liu, Yunzhi Yao, Wenxuan Wang, Ruotian Ma, Haitao Mi, Ningyu Zhang, Zhaopeng Tu, Xiaolong Li, Dong Yu
cs.AI

摘要

大型推理模型(LRMs)中的专家混合(MoE)架构通过选择性激活专家来促进结构化认知过程,已展现出卓越的推理能力。尽管取得了显著进展,现有推理模型仍常受困于认知效率低下的问题,如过度思考与思考不足。为应对这些局限,我们引入了一种新颖的推理时引导方法——强化认知专家(RICE),旨在无需额外训练或复杂启发式策略的情况下提升推理性能。利用归一化点互信息(nPMI),我们系统性地识别出被称为“认知专家”的特定专家,这些专家主导着以“<think>”等标记为特征的元级推理操作。基于领先的MoE架构LRMs(如DeepSeek-R1和Qwen3-235B)在严格的定量与科学推理基准上的实证评估显示,该方法在推理准确性、认知效率及跨领域泛化能力上均实现了显著且一致的提升。尤为重要的是,我们的轻量级方法在保持模型通用指令跟随能力的同时,显著优于提示设计与解码约束等主流推理引导技术。这些成果表明,强化认知专家是提升高级推理模型认知效率的一个有前景、实用且可解释的研究方向。
English
Mixture-of-Experts (MoE) architectures within Large Reasoning Models (LRMs) have achieved impressive reasoning capabilities by selectively activating experts to facilitate structured cognitive processes. Despite notable advances, existing reasoning models often suffer from cognitive inefficiencies like overthinking and underthinking. To address these limitations, we introduce a novel inference-time steering methodology called Reinforcing Cognitive Experts (RICE), designed to improve reasoning performance without additional training or complex heuristics. Leveraging normalized Pointwise Mutual Information (nPMI), we systematically identify specialized experts, termed ''cognitive experts'' that orchestrate meta-level reasoning operations characterized by tokens like ''<think>''. Empirical evaluations with leading MoE-based LRMs (DeepSeek-R1 and Qwen3-235B) on rigorous quantitative and scientific reasoning benchmarks demonstrate noticeable and consistent improvements in reasoning accuracy, cognitive efficiency, and cross-domain generalization. Crucially, our lightweight approach substantially outperforms prevalent reasoning-steering techniques, such as prompt design and decoding constraints, while preserving the model's general instruction-following skills. These results highlight reinforcing cognitive experts as a promising, practical, and interpretable direction to enhance cognitive efficiency within advanced reasoning models.

Summary

AI-Generated Summary

PDF71May 21, 2025