ChatPaper.aiChatPaper

知识不足为训:注入强化学习技能实现持续适应

Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

January 16, 2026
作者: Pingzhi Tang, Yiding Wang, Muhan Zhang
cs.AI

摘要

大型语言模型面临"知识截止"挑战,其固化的参数化记忆阻碍了新信息的直接内化。虽然监督微调常用于更新模型知识,但这种方法往往只更新事实内容,未能可靠提升模型运用新知识进行问答或决策的能力。强化学习对培养推理能力至关重要,但其高昂的计算成本使得在线高效适配难以实现。我们通过实证发现,监督微调与强化学习引发的参数更新近乎正交。基于此发现,我们提出参数化技能迁移框架,通过模块化技能转移实现高效的知识适配。该框架从源领域提取领域无关的技能向量,在对目标模型进行新数据轻量级微调后,可线性注入知识操纵技能。在知识融合问答和智能体工具使用基准测试上的实验验证了方法的有效性:在SQuAD数据集上,我们的方法较最先进的自我编辑式监督微调基线提升达9.9分;在LooGLE长文本问答任务中实现8.0分的绝对准确率提升;在ToolBench工具使用基准上平均零样本成功率提高10.3分,且在不同工具类别间均保持稳定增益,表明技能向量具备优秀的可扩展性与跨领域迁移能力。
English
Large Language Models (LLMs) face the "knowledge cutoff" challenge, where their frozen parametric memory prevents direct internalization of new information. While Supervised Fine-Tuning (SFT) is commonly used to update model knowledge, it often updates factual content without reliably improving the model's ability to use the newly incorporated information for question answering or decision-making. Reinforcement Learning (RL) is essential for acquiring reasoning skills; however, its high computational cost makes it impractical for efficient online adaptation. We empirically observe that the parameter updates induced by SFT and RL are nearly orthogonal. Based on this observation, we propose Parametric Skill Transfer (PaST), a framework that supports modular skill transfer for efficient and effective knowledge adaptation. By extracting a domain-agnostic Skill Vector from a source domain, we can linearly inject knowledge manipulation skills into a target model after it has undergone lightweight SFT on new data. Experiments on knowledge-incorporation QA (SQuAD, LooGLE) and agentic tool-use benchmarks (ToolBench) demonstrate the effectiveness of our method. On SQuAD, PaST outperforms the state-of-the-art self-editing SFT baseline by up to 9.9 points. PaST further scales to long-context QA on LooGLE with an 8.0-point absolute accuracy gain, and improves zero-shot ToolBench success rates by +10.3 points on average with consistent gains across tool categories, indicating strong scalability and cross-domain transferability of the Skill Vector.
PDF42January 27, 2026