正準事例を用いたモデル編集

要旨

正準例を用いたモデル編集を導入する。この設定では、(1) 各望ましい振る舞いに対して単一の学習例が提供され、(2) 評価は分布外でのみ行われ、(3) 初期モデルからの逸脱が厳密に制限される。正準例とは、良い振る舞い（例：モーリシャスの首都はポートルイス）または悪い振る舞い（例：研究者の一面は冷たい）の単純なインスタンスである。評価セットには、各振る舞いのより複雑な例（モーリシャスの首都が求められる段落など）が含まれる。正準例を用いたモデル編集のために、3つのデータセットを作成し、さらに3つのデータセットを修正し、知識集約的な改善、社会的バイアスの軽減、構文のエッジケースをカバーする。Pythia言語モデルでの実験では、LoRAがフルファインチューニングとMEMITを上回ることを確認した。次に、ターゲットを絞った改善を可能にすることを目的としたBackpack言語モデルアーキテクチャに注目する。Backpackは、各単語の異なる使用法を分解した大規模なセンスベクトルのバンクを定義し、これらを重み付けして合計することでモデルの出力ロジットを形成する。我々は、各正準例に対して少数（約10個）のセンスベクトルを選択しファインチューニングするセンスファインチューニングを提案し、他のファインチューニング方法（例：4.8%の改善 vs 0.3%）を上回ることを確認した。最後に、35倍小さいBackpackのセンスファインチューニングによる変更のみを用いた推論時のアンサンブルによりGPT-J-6Bを改善し、ある設定ではGPT-J自体の編集（4.1% vs 1.0%）を上回る結果を得た。

English

We introduce model editing with canonical examples, a setting in which (1) a single learning example is provided per desired behavior, (2) evaluation is performed exclusively out-of-distribution, and (3) deviation from an initial model is strictly limited. A canonical example is a simple instance of good behavior, e.g., The capital of Mauritius is Port Louis) or bad behavior, e.g., An aspect of researchers is coldhearted). The evaluation set contains more complex examples of each behavior (like a paragraph in which the capital of Mauritius is called for.) We create three datasets and modify three more for model editing with canonical examples, covering knowledge-intensive improvements, social bias mitigation, and syntactic edge cases. In our experiments on Pythia language models, we find that LoRA outperforms full finetuning and MEMIT. We then turn to the Backpack language model architecture because it is intended to enable targeted improvement. The Backpack defines a large bank of sense vectors--a decomposition of the different uses of each word--which are weighted and summed to form the output logits of the model. We propose sense finetuning, which selects and finetunes a few (approx 10) sense vectors for each canonical example, and find that it outperforms other finetuning methods, e.g., 4.8% improvement vs 0.3%. Finally, we improve GPT-J-6B by an inference-time ensemble with just the changes from sense finetuning of a 35x smaller Backpack, in one setting outperforming editing GPT-J itself (4.1% vs 1.0%).

正準事例を用いたモデル編集

Model Editing with Canonical Examples

要旨

Support