REPAIR: プログレッシブな適応的介入と再統合によるロバストな編集

要旨

大規模言語モデル（LLM）のポストトレーニングは、新たな知識の獲得や誤りの修正に伴う高コスト、および再トレーニングから頻繁に生じる意図しない副作用によって制約を受けています。これらの課題に対処するため、我々はREPAIR（Robust Editing via Progressive Adaptive Intervention and Reintegration）を提案します。これは、非ターゲット知識を維持しつつ、正確で低コストなモデル更新をサポートするための生涯編集フレームワークです。REPAIRは、閉ループフィードバックメカニズムと動的メモリ管理を組み合わせることで、大規模な逐次編集における不安定性や衝突を緩和します。さらに、頻繁な知識融合を組み込み、強力な局所性ガードを適用することで、従来の分布非依存型アプローチがしばしば見落とす意図しない波及効果の欠点を効果的に解決します。我々の実験では、REPAIRが複数のモデルファミリーにわたって編集精度を10％～30％向上させ、知識の忘却を大幅に減少させることを示しています。この研究は、信頼性が高く、スケーラブルで、継続的に進化するLLMを開発するための堅牢なフレームワークを提供します。

English

Post-training for large language models (LLMs) is constrained by the high cost of acquiring new knowledge or correcting errors and by the unintended side effects that frequently arise from retraining. To address these issues, we introduce REPAIR (Robust Editing via Progressive Adaptive Intervention and Reintegration), a lifelong editing framework designed to support precise and low-cost model updates while preserving non-target knowledge. REPAIR mitigates the instability and conflicts of large-scale sequential edits through a closed-loop feedback mechanism coupled with dynamic memory management. Furthermore, by incorporating frequent knowledge fusion and enforcing strong locality guards, REPAIR effectively addresses the shortcomings of traditional distribution-agnostic approaches that often overlook unintended ripple effects. Our experiments demonstrate that REPAIR boosts editing accuracy by 10%-30% across multiple model families and significantly reduces knowledge forgetting. This work introduces a robust framework for developing reliable, scalable, and continually evolving LLMs.

REPAIR: プログレッシブな適応的介入と再統合によるロバストな編集

REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration

要旨

Support