推理向量：透過任務算術轉移思維鏈能力

摘要

大型語言模型通常需要進行昂貴的優化，例如強化學習，才能掌握複雜的推理任務。本研究表明，一旦學習到的推理能力可以被提取並以緊湊的任務向量形式在模型之間傳遞。我們選取了兩個公開可用、初始化相同的Qwen2.5模型，其中一個通過監督微調（SFT）進行了微調，另一個則在同一數據集上使用了群組相對策略優化（GRPO）。從這些模型中，我們提取了一個推理向量：v_{reason} = theta_{GRPO} - theta_{SFT}。我們假設這個向量捕捉了強化學習所灌輸的推理能力，同時排除了SFT過程中的共享知識。當通過簡單的算術運算將該向量添加到兼容的指令微調模型時，它在多樣化的推理基準測試中持續提升了性能：GSM8K（+4.9%）、HumanEval（+4.3%）、SciQ（+1.7%）以及BigBenchHard（1.5B模型上+12.3%）。在對抗性條件下，性能提升依然存在。相反，減去該向量會導致性能顯著下降（GSM8K上-11.8%），這表明該向量對模型的推理能力有重大貢獻。這項工作展示了如何從現有的開源模型中提取通常通過昂貴訓練開發的推理能力，並通過簡單的張量運算進行再利用，提供了一種通過回收先前的計算投資來增強模型的實用方法。

English

Large language models often require costly optimization, such as reinforcement learning, to master complex reasoning tasks. This work demonstrates that reasoning ability, once learned, can be extracted and transferred between models as a compact task vector. We source two publicly available, identically initialized Qwen2.5 models, one fine-tuned with supervised fine-tuning (SFT) and the other with group relative policy optimization (GRPO) on the same dataset. From these, we extract a reasoning vector: v_{reason} = theta_{GRPO} - theta_{SFT}. We hypothesize that this vector captures the reasoning capability instilled by reinforcement learning while factoring out shared knowledge from the SFT process. When added to compatible instruction-tuned models through simple arithmetic, this vector consistently improves performance across diverse reasoning benchmarks: GSM8K (+4.9%), HumanEval (+4.3%), SciQ (+1.7%), and BigBenchHard (+12.3% for the 1.5B model). The performance improvements persist under adversarial conditions. Conversely, subtracting the vector causes significant performance degradation (-11.8% on GSM8K), demonstrating the vector's strong contribution to the model's reasoning abilities. This work shows how reasoning capabilities, typically developed through expensive training, can be extracted from existing open-source models and reused through simple tensor arithmetic, offering a practical way to enhance models by recycling prior computational investments.

推理向量：透過任務算術轉移思維鏈能力

Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

摘要

Support