ZJUKLAB at SemEval-2025 Task 4: モデルマージによるアンラーニング

要旨

本論文は、SemEval-2025 Task 4「大規模言語モデルからのセンシティブなコンテンツのアンラーニング」に対するZJUKLABチームの提出物を紹介する。このタスクは、大規模言語モデルからセンシティブな知識を選択的に削除し、過剰な忘却と不十分な忘却の両方を回避することを目的としている。我々は、Model Merging（特にTIES-Merging）を活用したアンラーニングシステムを提案し、2つの専門化されたモデルを組み合わせてよりバランスの取れたアンラーニングモデルを構築する。このシステムは、26チーム中2位という競争力のある結果を達成し、Task Aggregateでは0.944、全体のAggregateでは0.487のオンラインスコアを記録した。本論文では、ローカル実験を実施し、アンラーニングプロセスの包括的な分析を行い、パフォーマンスの軌跡、損失ダイナミクス、重みの視点を検証するとともに、いくつかの補足実験を行い、我々の手法の有効性を理解する。さらに、我々の手法と評価指標の欠点を分析し、MIAスコアとROUGEベースの指標だけでは、成功したアンラーニングを完全に評価するには不十分であることを強調する。最後に、より包括的な評価方法と、将来の研究におけるアンラーニング目標の再考の必要性を強調する。コードはhttps://github.com/zjunlp/unlearn/tree/main/semeval25で公開されている。

English

This paper presents the ZJUKLAB team's submission for SemEval-2025 Task 4: Unlearning Sensitive Content from Large Language Models. This task aims to selectively erase sensitive knowledge from large language models, avoiding both over-forgetting and under-forgetting issues. We propose an unlearning system that leverages Model Merging (specifically TIES-Merging), combining two specialized models into a more balanced unlearned model. Our system achieves competitive results, ranking second among 26 teams, with an online score of 0.944 for Task Aggregate and 0.487 for overall Aggregate. In this paper, we also conduct local experiments and perform a comprehensive analysis of the unlearning process, examining performance trajectories, loss dynamics, and weight perspectives, along with several supplementary experiments, to understand the effectiveness of our method. Furthermore, we analyze the shortcomings of our method and evaluation metrics, emphasizing that MIA scores and ROUGE-based metrics alone are insufficient to fully evaluate successful unlearning. Finally, we emphasize the need for more comprehensive evaluation methodologies and rethinking of unlearning objectives in future research. Code is available at https://github.com/zjunlp/unlearn/tree/main/semeval25.

ZJUKLAB at SemEval-2025 Task 4: モデルマージによるアンラーニング

ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging

要旨

Support