EtCon:編輯後鞏固——實現可靠的知識編輯
EtCon: Edit-then-Consolidate for Reliable Knowledge Editing
December 4, 2025
作者: Ruilin Li, Yibin Wang, Wenhong Zhu, Chenglin Li, Jinghao Zhang, Chenliang Li, Junchi Yan, Jiaqi Wang
cs.AI
摘要
知識編輯旨在無需完整重新訓練的情況下更新大型語言模型(LLMs)中的特定事實。先前研究嘗試調整LLMs的知識層,證實能有效實現選擇性編輯。然而,這些方法在受控的教師強制評估中的表現,與其在終身學習場景中的實際效能存在顯著落差,嚴重限制了實用性。本文的實證分析揭示了導致此落差的兩大癥結:(1) 多數傳統方法會使編輯後的模型對新事實過度擬合,從而削弱預訓練能力;(2) 關鍵性知識鞏固階段的缺失,導致新知識未能充分融入LLMs在自迴歸生成時的推理行為,造成參數化知識與實際生成行為的脫節。為此,我們提出「先編輯後鞏固」的新範式,以彌合理論性知識編輯方法與實際應用間的鴻溝。具體而言,(1) 透過「目標近端監督微調」定位編輯區域,利用信任域目標限制策略漂移以緩解過度擬合;(2) 隨後採用「群組相對策略優化」的鞏固階段,透過綜合獎勵信號下的軌跡級行為優化,將編輯後知識與基於思維鏈的推理策略對齊。大量實驗表明,本框架在真實場景評估中能持續提升編輯的可靠性與泛化能力,同時更好地保持局部性與預訓練能力。
English
Knowledge editing aims to update specific facts in large language models (LLMs) without full retraining. Prior efforts sought to tune the knowledge layers of LLMs, proving effective for making selective edits. However, a significant gap exists between their performance in controlled, teacher-forcing evaluations and their real-world effectiveness in lifelong learning scenarios, which greatly limits their practical applicability. This work's empirical analysis reveals two recurring issues associated with this gap: (1) Most traditional methods lead the edited model to overfit to the new fact, thereby degrading pre-trained capabilities; (2) There is a critical absence of a knowledge consolidation stage, leaving new facts insufficiently integrated into LLMs' inference-time behavior under autoregressive generation, thereby leading to a mismatch between parametric knowledge and actual generation behavior. To this end, we propose Edit-then-Consolidate, a novel knowledge editing paradigm that aims to bridge the gap between theoretical knowledge editing methods and their real-world applicability. Specifically, (1) our framework mitigates overfitting via Targeted Proximal Supervised Fine-Tuning (TPSFT) that localizes the edit via a trust-region objective to limit policy drift; (2) Then, a consolidation stage using Group Relative Policy Optimization (GRPO) aligns the edited knowledge with CoT-based inference policy by optimizing trajectory-level behavior under comprehensive reward signals. Extensive experiments demonstrate our framework consistently improves editing reliability and generalization under real-world evaluations, while better preserving locality and pre-trained capabilities.