정규 예제를 활용한 모델 편집

초록

우리는 정형 예제를 활용한 모델 편집을 소개한다. 이 설정은 (1) 원하는 동작마다 단일 학습 예제를 제공하고, (2) 평가는 오직 분포 외 데이터에서 수행하며, (3) 초기 모델과의 편차를 엄격히 제한한다는 특징을 가진다. 정형 예제는 좋은 행동(예: 모리셔스의 수도는 포트루이스)이나 나쁜 행동(예: 연구원의 한 측면은 냉담함)의 간단한 사례를 말한다. 평가 세트에는 각 행동의 더 복잡한 예제(예: 모리셔스의 수도가 언급된 문단)가 포함된다. 우리는 정형 예제를 통한 모델 편집을 위해 세 개의 데이터셋을 새로 생성하고 세 개를 수정하여, 지식 집약적 개선, 사회적 편향 완화, 구문적 경계 사례를 다룬다. Pythia 언어 모델에 대한 실험에서 LoRA가 전체 미세조정과 MEMIT을 능가하는 것을 확인했다. 이후 우리는 타겟팅된 개선을 가능하게 하도록 설계된 Backpack 언어 모델 아키텍처로 주목했다. Backpack은 각 단어의 다양한 용법을 분해한 의미 벡터(sense vector)의 대규모 뱅크를 정의하며, 이 벡터들은 가중치가 부여되고 합산되어 모델의 출력 로짓을 형성한다. 우리는 각 정형 예제에 대해 소수(약 10개)의 의미 벡터를 선택하고 미세조정하는 의미 미세조정(sense finetuning)을 제안했으며, 이 방법이 다른 미세조정 방법들보다 우수한 성능을 보임을 확인했다(예: 4.8% 개선 대 0.3%). 마지막으로, 우리는 GPT-J-6B를 35배 더 작은 Backpack의 의미 미세조정 변경만으로 추론 시 앙상블을 통해 개선했으며, 한 설정에서는 GPT-J 자체를 편집하는 것보다 더 나은 성능을 보였다(4.1% 대 1.0%).

English

We introduce model editing with canonical examples, a setting in which (1) a single learning example is provided per desired behavior, (2) evaluation is performed exclusively out-of-distribution, and (3) deviation from an initial model is strictly limited. A canonical example is a simple instance of good behavior, e.g., The capital of Mauritius is Port Louis) or bad behavior, e.g., An aspect of researchers is coldhearted). The evaluation set contains more complex examples of each behavior (like a paragraph in which the capital of Mauritius is called for.) We create three datasets and modify three more for model editing with canonical examples, covering knowledge-intensive improvements, social bias mitigation, and syntactic edge cases. In our experiments on Pythia language models, we find that LoRA outperforms full finetuning and MEMIT. We then turn to the Backpack language model architecture because it is intended to enable targeted improvement. The Backpack defines a large bank of sense vectors--a decomposition of the different uses of each word--which are weighted and summed to form the output logits of the model. We propose sense finetuning, which selects and finetunes a few (approx 10) sense vectors for each canonical example, and find that it outperforms other finetuning methods, e.g., 4.8% improvement vs 0.3%. Finally, we improve GPT-J-6B by an inference-time ensemble with just the changes from sense finetuning of a 35x smaller Backpack, in one setting outperforming editing GPT-J itself (4.1% vs 1.0%).

정규 예제를 활용한 모델 편집

Model Editing with Canonical Examples

초록

Support