ChatPaper.aiChatPaper

通用推理器:一种单一、可组合的即插即用推理模块,适用于冻结的大型语言模型

Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

May 25, 2025
作者: Jaemin Kim, Hangeol Chang, Hyunmin Hwang, Choonghan Kim, Jong Chul Ye
cs.AI

摘要

大型语言模型(LLMs)已展现出卓越的通用能力,但提升诸如推理等技能通常需要大量计算资源,并可能削弱其泛化能力。尽管参数高效微调(PEFT)方法提供了一种更为资源节约的替代方案,但由于架构依赖性,它们通常需要针对每个LLM主干进行重新训练。为应对这些挑战,本文提出通用推理器(UniR)——一个轻量级、可组合、即插即用的单一推理模块,能够与任何冻结的LLM结合,赋予其专业推理能力。具体而言,UniR将奖励分解为一个独立的推理模块,该模块利用预定义奖励进行独立训练,有效将轨迹级信号转化为令牌级指导。训练完成后,UniR可在推理时与任何冻结的LLM结合,只需将其输出逻辑与LLM主干的逻辑相加即可。这种加法结构自然支持模块化组合:针对不同任务训练的多个UniR模块可通过逻辑求和联合应用,实现通过组合进行复杂推理。在数学推理和机器翻译任务上的实验结果表明,UniR显著优于使用Llama3.2模型的现有基线微调方法。此外,UniR展现了强大的弱到强泛化能力:在较小模型上训练的推理模块能有效指导更大规模的LLMs。这使得UniR成为在不损害LLM核心能力的前提下,增强其推理能力的成本效益高、适应性强且稳健的解决方案。代码已开源,地址为https://github.com/hangeol/UniR。
English
Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise their generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically requires retraining for each LLM backbone due to architectural dependencies. To address these challenges, here we propose Universal Reasoner (UniR) - a single, lightweight, composable, and plug-and-play reasoning module that can be used with any frozen LLM to endow it with specialized reasoning capabilities. Specifically, UniR decomposes the reward into a standalone reasoning module that is trained independently using predefined rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR can be combined with any frozen LLM at inference time by simply adding its output logits to those of the LLM backbone. This additive structure naturally enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Experimental results on mathematical reasoning and machine translation tasks show that UniR significantly outperforms existing baseline fine-tuning methods using the Llama3.2 model. Furthermore, UniR demonstrates strong weak-to-strong generalization: reasoning modules trained on smaller models effectively guide much larger LLMs. This makes UniR a cost-efficient, adaptable, and robust solution for enhancing reasoning in LLMs without compromising their core capabilities. Code is open-sourced at https://github.com/hangeol/UniR

Summary

AI-Generated Summary

PDF202May 29, 2025