대규모 언어 모델을 위한 실용적 언러닝

초록

LLM(대형 언어 모델)은 다양한 도메인과 작업에서 인상적인 성능을 보여왔지만, 그 보안 문제는 점점 더 심각해지고 있습니다. 이러한 문제를 해결하기 위해 기계 언러닝(MU)이 주목받고 있는데, 이는 원치 않는 데이터가 대상 모델에 미치는 영향을 제거하면서도 다른 측면에서의 유용성을 손상시키지 않는 방법으로 떠오르고 있습니다. 일반적으로 MU는 유용성을 보존하기 위해 원본 학습 데이터에 대한 완전한 접근을 가정하지만, 이는 LLM 언러닝에서는 달성하기 어려운 조건입니다. 기존의 LLM 언러닝 방법들은 주로 원치 않는 데이터 언러닝에 가장 크게 영향을 받는 데이터에 접근할 수 있다고 가정합니다. 그러나 이러한 가정은 다양한 LLM 능력 간의 복잡한 상호 연관성을 과소평가하며, 다양한 문제로 인한 데이터 접근 제한을 무시합니다. 또한, 이러한 LLM 언러닝 방법들은 실제 시나리오에서 언러닝 요청이 지속적으로 발생한다는 점을 충분히 고려하지 않습니다. 이러한 문제를 극복하고 실용적인 LLM 언러닝을 달성하기 위해, 우리는 O3 프레임워크를 제안합니다. O3 프레임워크는 입력 데이터와 언러닝 데이터 간의 유사성을 측정하기 위한 Out-Of-Distribution(OOD) 탐지기와, 요청된 데이터를 지속적으로 언러닝하기 위한 Orthogonal 저순위 어댑터(LoRA)를 포함합니다. OOD 탐지기는 새로운 대조 엔트로피 손실을 사용하여 훈련되며, 지역-전역 계층 집계 점수 메커니즘을 활용합니다. Orthogonal LoRA는 지속적인 언러닝 요청 간의 매개변수 분리를 달성합니다. 추론 과정에서 O3 프레임워크는 OOD 탐지기의 예측을 기반으로 언러닝 LoRA를 로드할지 여부와 그 정도를 스마트하게 결정할 수 있습니다. 특히, O3의 효과성은 어떤 보유 데이터에도 의존하지 않습니다. 우리는 O3와 최신 LLM 언러닝 방법들을 세 가지 작업과 일곱 개의 데이터셋에서 광범위하게 실험했습니다. 그 결과, O3는 특히 지속적인 언러닝 요청에 직면했을 때 언러닝 효과성과 유용성 보존 사이의 최적의 균형을 일관되게 달성함을 보여주었습니다.

English

While LLMs have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning (MU) has emerged as a promising solution to address these issues by removing the influence of undesired data on the target model without compromising its utility in other aspects. MU typically assumes full access to the original training data to preserve utility, which is difficult to achieve in LLM unlearning. Existing LLM unlearning methods often assume access to data most affected by undesired data unlearning. However, this assumption underestimates the entanglement among various LLM capabilities and ignores data access limitations due to various issues. Moreover, these LLM unlearning methods do not sufficiently consider that unlearning requests in real-world scenarios are continuously emerging. To overcome these challenges and achieve practical LLM unlearning, we propose the O3 framework. The O3 framework includes an Out-Of-Distribution (OOD) detector to measure the similarity between input and unlearning data, and an Orthogonal low-rank adapter (LoRA) for continuously unlearning requested data. The OOD detector is trained with a novel contrastive entropy loss and utilizes a local-global layer-aggregated scoring mechanism. The orthogonal LoRA achieves parameter disentanglement among continual unlearning requests. During inference, our O3 framework can smartly decide whether and to what extent to load the unlearning LoRA based on the OOD detector's predictions. Notably, O3's effectiveness does not rely on any retained data. We conducted extensive experiments on O3 and state-of-the-art LLM unlearning methods across three tasks and seven datasets. The results indicate that O3 consistently achieves the best trade-off between unlearning effectiveness and utility preservation, especially when facing continuous unlearning requests.

대규모 언어 모델을 위한 실용적 언러닝

Practical Unlearning for Large Language Models

초록

Support