知识不足为凭:注入强化学习技能实现持续适应
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
January 16, 2026
作者: Pingzhi Tang, Yiding Wang, Muhan Zhang
cs.AI
摘要
大型语言模型(LLMs)面临"知识截止"挑战,其固化的参数化记忆无法直接内化新信息。虽然监督微调(SFT)常被用于更新模型知识,但这种方法往往只更新事实内容,却无法可靠提升模型运用新知识进行问答或决策的能力。强化学习(RL)对培养推理能力至关重要,但其高昂的计算成本使其难以实现高效的在线适应。我们通过实证发现,SFT和RL引发的参数更新近乎正交。基于此观察,我们提出参数化技能迁移(PaST)框架,通过模块化技能转移实现高效的知识适应。通过从源领域提取领域无关的技能向量,我们可在目标模型完成新数据轻量级SFT后,线性注入知识操纵技能。在知识整合问答(SQuAD、LooGLE)和智能体工具使用基准(ToolBench)上的实验证明了方法的有效性:在SQuAD上,PaST较最先进的自我编辑SFT基线提升达9.9分;在LooGLE长上下文问答中实现8.0分的绝对准确率增益;在ToolBench上零样本成功率平均提升10.3分,且跨工具类别表现一致,表明技能向量具有强扩展性和跨领域迁移能力。
English
Large Language Models (LLMs) face the "knowledge cutoff" challenge, where their frozen parametric memory prevents direct internalization of new information. While Supervised Fine-Tuning (SFT) is commonly used to update model knowledge, it often updates factual content without reliably improving the model's ability to use the newly incorporated information for question answering or decision-making. Reinforcement Learning (RL) is essential for acquiring reasoning skills; however, its high computational cost makes it impractical for efficient online adaptation. We empirically observe that the parameter updates induced by SFT and RL are nearly orthogonal. Based on this observation, we propose Parametric Skill Transfer (PaST), a framework that supports modular skill transfer for efficient and effective knowledge adaptation. By extracting a domain-agnostic Skill Vector from a source domain, we can linearly inject knowledge manipulation skills into a target model after it has undergone lightweight SFT on new data. Experiments on knowledge-incorporation QA (SQuAD, LooGLE) and agentic tool-use benchmarks (ToolBench) demonstrate the effectiveness of our method. On SQuAD, PaST outperforms the state-of-the-art self-editing SFT baseline by up to 9.9 points. PaST further scales to long-context QA on LooGLE with an 8.0-point absolute accuracy gain, and improves zero-shot ToolBench success rates by +10.3 points on average with consistent gains across tool categories, indicating strong scalability and cross-domain transferability of the Skill Vector.