ReLearn: 大規模言語モデルのための学習を通じた忘却

要旨

大規模言語モデルの現在の忘却方法は、通常、目標トークンの確率を減らすために逆最適化に依存しています。しかし、このパラダイムは、後続のトークン予測を妨げ、モデルの性能と言語的一貫性を低下させます。さらに、既存の評価メトリクスは、文脈の忘却を過度に強調しており、応答の流暢さや関連性を適切に評価していません。これらの課題に対処するために、私たちはReLearnを提案します。これは、効果的な忘却のためのデータ拡張と微調整パイプラインであり、包括的な評価フレームワークを備えています。このフレームワークでは、知識忘却率（KFR）と知識保持率（KRR）を導入して知識レベルの保存を測定し、また、言語スコア（LS）を導入して生成品質を評価します。私たちの実験では、ReLearnがターゲットの忘却を成功裏に達成しながら、高品質な出力を維持していることが示されています。機構的な分析を通じて、逆最適化が一貫したテキスト生成を妨げる一方、ReLearnがこの重要な機能を維持している方法をさらに実証しています。コードはhttps://github.com/zjunlp/unlearnで入手可能です。

English

Current unlearning methods for large language models usually rely on reverse optimization to reduce target token probabilities. However, this paradigm disrupts the subsequent tokens prediction, degrading model performance and linguistic coherence. Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation, and Linguistic Score (LS) to evaluate generation quality. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output. Through mechanistic analysis, we further demonstrate how reverse optimization disrupts coherent text generation, while ReLearn preserves this essential capability. Code is available at https://github.com/zjunlp/unlearn.

ReLearn: 大規模言語モデルのための学習を通じた忘却

ReLearn: Unlearning via Learning for Large Language Models

要旨

Support