CrispEdit：基于低曲率投影的可扩展非破坏性大语言模型编辑方法

摘要

大型语言模型（LLM）编辑的核心挑战在于能力保持：那些成功改变目标行为的方法可能悄然操纵编辑代理指标，损害模型的通用能力，产生类似代理/奖励破解的退化行为。我们提出CrispEdit——一种可扩展且基于原理的二阶编辑算法，将能力保持作为显式约束，统一并推广了多种现有编辑方法。CrispEdit将编辑问题构建为约束优化，通过将编辑更新投影至能力损失曲面的低曲率子空间来强化约束。该算法的核心在于通过布雷格曼散度表达能力约束，其二次形式能精确给出高斯-牛顿海森矩阵，且即使基础模型未达到收敛状态仍适用。我们采用克罗内克分解近似曲率（K-FAC）和一种新型无矩阵投影器，使这一二阶过程能高效应用于LLM规模，该投影器利用克罗内克结构避免构建巨型投影矩阵。在标准模型编辑基准测试中，CrispEdit在实现高编辑成功率的同时，将各数据集上的能力退化控制在1%以内，较现有编辑方法有显著提升。

English

A central challenge in large language model (LLM) editing is capability preservation: methods that successfully change targeted behavior can quietly game the editing proxy and corrupt general capabilities, producing degenerate behaviors reminiscent of proxy/reward hacking. We present CrispEdit, a scalable and principled second-order editing algorithm that treats capability preservation as an explicit constraint, unifying and generalizing several existing editing approaches. CrispEdit formulates editing as constrained optimization and enforces the constraint by projecting edit updates onto the low-curvature subspace of the capability-loss landscape. At the crux of CrispEdit is expressing capability constraint via Bregman divergence, whose quadratic form yields the Gauss-Newton Hessian exactly and even when the base model is not trained to convergence. We make this second-order procedure efficient at the LLM scale using Kronecker-factored approximate curvature (K-FAC) and a novel matrix-free projector that exploits Kronecker structure to avoid constructing massive projection matrices. Across standard model-editing benchmarks, CrispEdit achieves high edit success while keeping capability degradation below 1% on average across datasets, significantly improving over prior editors.

CrispEdit：基于低曲率投影的可扩展非破坏性大语言模型编辑方法

CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing

摘要

Support