REPAIR: 점진적 적응형 개입과 재통합을 통한 강건한 편집

초록

대규모 언어 모델(LLM)의 사후 학습은 새로운 지식을 습득하거나 오류를 수정하는 데 드는 높은 비용과 재학습 과정에서 빈번히 발생하는 의도하지 않은 부작용으로 인해 제약을 받습니다. 이러한 문제를 해결하기 위해, 우리는 정확하고 저비용의 모델 업데이트를 지원하면서도 비목표 지식을 보존할 수 있는 평생 편집 프레임워크인 REPAIR(Robust Editing via Progressive Adaptive Intervention and Reintegration)를 소개합니다. REPAIR는 폐루프 피드백 메커니즘과 동적 메모리 관리를 결합하여 대규모 순차적 편집의 불안정성과 충돌을 완화합니다. 또한, 빈번한 지식 융합을 통합하고 강력한 지역성 보호를 강제함으로써, 의도하지 않은 파급 효과를 종종 간과하는 전통적인 분포-불변 접근법의 단점을 효과적으로 해결합니다. 우리의 실험 결과, REPAIR는 여러 모델 패밀리에서 편집 정확도를 10%-30% 향상시키고 지식 망각을 크게 줄이는 것으로 나타났습니다. 이 연구는 신뢰할 수 있고 확장 가능하며 지속적으로 진화하는 LLM을 개발하기 위한 견고한 프레임워크를 제시합니다.

English

Post-training for large language models (LLMs) is constrained by the high cost of acquiring new knowledge or correcting errors and by the unintended side effects that frequently arise from retraining. To address these issues, we introduce REPAIR (Robust Editing via Progressive Adaptive Intervention and Reintegration), a lifelong editing framework designed to support precise and low-cost model updates while preserving non-target knowledge. REPAIR mitigates the instability and conflicts of large-scale sequential edits through a closed-loop feedback mechanism coupled with dynamic memory management. Furthermore, by incorporating frequent knowledge fusion and enforcing strong locality guards, REPAIR effectively addresses the shortcomings of traditional distribution-agnostic approaches that often overlook unintended ripple effects. Our experiments demonstrate that REPAIR boosts editing accuracy by 10%-30% across multiple model families and significantly reduces knowledge forgetting. This work introduces a robust framework for developing reliable, scalable, and continually evolving LLMs.

REPAIR: 점진적 적응형 개입과 재통합을 통한 강건한 편집

REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration

초록

Support