ユニバーサル・リーズナー：凍結されたLLMのための単一で構成可能なプラグアンドプレイ型推論器

要旨

大規模言語モデル（LLMs）は驚くべき汎用能力を示していますが、推論などのスキルを強化するには、しばしば膨大な計算リソースが必要であり、その汎化能力を損なう可能性があります。パラメータ効率の良いファインチューニング（PEFT）手法は、リソースを節約する代替手段を提供しますが、通常、アーキテクチャの依存性により、各LLMバックボーンごとに再トレーニングが必要です。これらの課題に対処するため、ここではUniversal Reasoner（UniR）を提案します。UniRは、単一の軽量で構成可能なプラグアンドプレイ型の推論モジュールであり、任意の凍結されたLLMと組み合わせて、専門的な推論能力を付与することができます。具体的には、UniRは報酬を独立した推論モジュールに分解し、事前に定義された報酬を使用して独立してトレーニングを行い、軌跡レベルの信号をトークンレベルのガイダンスに効果的に変換します。一度トレーニングされると、UniRは推論時に任意の凍結されたLLMと組み合わせることができ、その出力ロジットをLLMバックボーンのロジットに単純に加算するだけで済みます。この加算構造は、自然にモジュール構成を可能にします。異なるタスク用にトレーニングされた複数のUniRモジュールを、それらのロジットを合計することで共同で適用することができ、構成を通じて複雑な推論を実現します。数学的推論と機械翻訳タスクにおける実験結果は、UniRがLlama3.2モデルを使用した既存のベースラインファインチューニング手法を大幅に上回ることを示しています。さらに、UniRは強力な弱から強への汎化を示します。より小さなモデルでトレーニングされた推論モジュールが、はるかに大きなLLMを効果的にガイドします。これにより、UniRは、LLMのコア能力を損なうことなく、推論を強化するためのコスト効率が高く、適応性があり、堅牢なソリューションとなります。コードはhttps://github.com/hangeol/UniRでオープンソース化されています。

English

Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise their generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically requires retraining for each LLM backbone due to architectural dependencies. To address these challenges, here we propose Universal Reasoner (UniR) - a single, lightweight, composable, and plug-and-play reasoning module that can be used with any frozen LLM to endow it with specialized reasoning capabilities. Specifically, UniR decomposes the reward into a standalone reasoning module that is trained independently using predefined rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR can be combined with any frozen LLM at inference time by simply adding its output logits to those of the LLM backbone. This additive structure naturally enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Experimental results on mathematical reasoning and machine translation tasks show that UniR significantly outperforms existing baseline fine-tuning methods using the Llama3.2 model. Furthermore, UniR demonstrates strong weak-to-strong generalization: reasoning modules trained on smaller models effectively guide much larger LLMs. This makes UniR a cost-efficient, adaptable, and robust solution for enhancing reasoning in LLMs without compromising their core capabilities. Code is open-sourced at https://github.com/hangeol/UniR

ユニバーサル・リーズナー：凍結されたLLMのための単一で構成可能なプラグアンドプレイ型推論器

Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

要旨

Support