推理向量：通过任务算术迁移思维链能力

摘要

大型语言模型通常需要昂贵的优化过程，如强化学习，才能掌握复杂的推理任务。本研究表明，一旦习得推理能力，便可将其提取并以紧凑的任务向量形式在模型间传递。我们选取了两个公开可用、初始化相同的Qwen2.5模型，一个通过监督微调（SFT）进行优化，另一个则在同一数据集上采用群体相对策略优化（GRPO）进行训练。从中，我们提取了一个推理向量：v_{reason} = theta_{GRPO} - theta_{SFT}。我们假设该向量捕捉了强化学习赋予的推理能力，同时剔除了SFT过程中共享的知识。当通过简单的算术运算将该向量添加到兼容的指令微调模型时，它持续提升了多种推理基准测试的表现：GSM8K（+4.9%）、HumanEval（+4.3%）、SciQ（+1.7%）以及BigBenchHard（对于1.5B模型，+12.3%）。即使在对抗性条件下，性能提升依然显著。相反，减去该向量会导致性能显著下降（GSM8K上-11.8%），证明了该向量对模型推理能力的重大贡献。本研究展示了如何从现有的开源模型中提取通常通过昂贵训练获得的推理能力，并通过简单的张量运算进行复用，为通过回收先前的计算投资来增强模型提供了一种实用方法。

English

Large language models often require costly optimization, such as reinforcement learning, to master complex reasoning tasks. This work demonstrates that reasoning ability, once learned, can be extracted and transferred between models as a compact task vector. We source two publicly available, identically initialized Qwen2.5 models, one fine-tuned with supervised fine-tuning (SFT) and the other with group relative policy optimization (GRPO) on the same dataset. From these, we extract a reasoning vector: v_{reason} = theta_{GRPO} - theta_{SFT}. We hypothesize that this vector captures the reasoning capability instilled by reinforcement learning while factoring out shared knowledge from the SFT process. When added to compatible instruction-tuned models through simple arithmetic, this vector consistently improves performance across diverse reasoning benchmarks: GSM8K (+4.9%), HumanEval (+4.3%), SciQ (+1.7%), and BigBenchHard (+12.3% for the 1.5B model). The performance improvements persist under adversarial conditions. Conversely, subtracting the vector causes significant performance degradation (-11.8% on GSM8K), demonstrating the vector's strong contribution to the model's reasoning abilities. This work shows how reasoning capabilities, typically developed through expensive training, can be extracted from existing open-source models and reused through simple tensor arithmetic, offering a practical way to enhance models by recycling prior computational investments.

推理向量：通过任务算术迁移思维链能力

Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

摘要

Support