CrispEdit: 확장 가능한 비파괴 LLM 편집을 위한 저곡률 투영 기법

초록

대규모 언어 모델(LLM) 편집의 핵심 과제는 능력 보존입니다: 대상 행동을 성공적으로 변경하는 방법들이 편집 프록시를 은밀히 악용하고 일반적인 능력을 손상시켜 프록시/보상 해킹을 연상시키는 퇴화된 행동을 생성할 수 있습니다. 본 논문에서는 능력 보존을 명시적 제약 조건으로 처리하여 기존 여러 편집 접근법을 통합 및 일반화하는 확장 가능하고 이론적인 2차 편집 알고리즘인 CrispEdit를 제시합니다. CrispEdit는 편집을 제약 최적화 문제로 공식화하고, 능력 손실 지형의 낮은 곡률 부분공간으로 편집 업데이트를 투영하여 제약 조건을 강제합니다. CrispEdit의 핵심은 Bregman 발산을 통해 능력 제약 조건을 표현하는 데 있으며, 그 2차 형태는 기본 모델이 수렴까지 학습되지 않은 경우에도 정확하게 Gauss-Newton Hessian을 제공합니다. 우리는 Kronecker-factored approximate curvature (K-FAC)와 대규모 투영 행렬 구성을 피하기 위해 Kronecker 구조를 활용한 새로운 matrix-free projector를 사용하여 이 2차 절차를 LLM 규모에서 효율적으로 만듭니다. 표준 모델 편집 벤치마크에서 CrispEdit는 높은 편집 성공률을 달성하면서 데이터셋 전반에 걸쳐 능력 저하를 평균 1% 미만으로 유지하여 기존 편집기법 대비 크게 개선된 성능을 보여줍니다.

English

A central challenge in large language model (LLM) editing is capability preservation: methods that successfully change targeted behavior can quietly game the editing proxy and corrupt general capabilities, producing degenerate behaviors reminiscent of proxy/reward hacking. We present CrispEdit, a scalable and principled second-order editing algorithm that treats capability preservation as an explicit constraint, unifying and generalizing several existing editing approaches. CrispEdit formulates editing as constrained optimization and enforces the constraint by projecting edit updates onto the low-curvature subspace of the capability-loss landscape. At the crux of CrispEdit is expressing capability constraint via Bregman divergence, whose quadratic form yields the Gauss-Newton Hessian exactly and even when the base model is not trained to convergence. We make this second-order procedure efficient at the LLM scale using Kronecker-factored approximate curvature (K-FAC) and a novel matrix-free projector that exploits Kronecker structure to avoid constructing massive projection matrices. Across standard model-editing benchmarks, CrispEdit achieves high edit success while keeping capability degradation below 1% on average across datasets, significantly improving over prior editors.

CrispEdit: 확장 가능한 비파괴 LLM 편집을 위한 저곡률 투영 기법

CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing

초록

Support