モデルは例からスキルの組み合わせを学習できるか？

要旨

大規模言語モデル（LLMs）がますます高度化するにつれて、それらが構成的汎化を示す能力―訓練中に遭遇しなかった新しい方法で学習したスキルを組み合わせる能力―が注目を集めています。特に、訓練データを超えたシナリオでのこの種の汎化は、AIの安全性と整合性の研究においても大きな関心事です。最近の研究では、SKILL-MIX評価が導入され、モデルに特定のk-タプルの言語スキルを使用した短い段落を作成するように課題が与えられました。小規模モデルはk=3でも作成に苦労しましたが、GPT-4のような大規模モデルはk=5および6ではかなりうまく機能しました。本論文では、SKILL-MIXに類似したセットアップを使用して、より小さなモデルが例から構成的汎化を学習する能力を評価します。修辞、文学、推論、心の理論、常識を含む多様な言語スキルを利用し、GPT-4を使用して、kスキルのランダムなサブセットを示すテキストサンプルを生成しました。これらの組み合わせスキルテキストで7Bおよび13Bパラメータモデルを後続のファインチューニングし、kの値を増やすことで、次の結果が明らかになりました：（1）k=2および3のスキルの組み合わせでトレーニングすると、モデルは訓練中にそのような例を見たことがないにもかかわらず、k=4および5のスキルを持つテキストを作成する能力が顕著に向上します。（2）スキルカテゴリがトレーニングと保持されたグループに分割された場合、モデルは、ファインチューニング中にトレーニングスキルしか見ていないにもかかわらず、テスト中に保持されたスキルを持つテキストを作成する能力が大幅に向上し、以前に見たことのないスキルでもトレーニングアプローチの効果を示しています。この研究は、スキル豊富な（おそらく合成の）テキストをトレーニングに取り入れることが、モデルの構成能力を大幅に向上させる可能性があることも示唆しています。

English

As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization -- the capacity to combine learned skills in novel ways not encountered during training -- has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment. A recent study introduced the SKILL-MIX evaluation, where models are tasked with composing a short paragraph demonstrating the use of a specified k-tuple of language skills. While small models struggled with composing even with k=3, larger models like GPT-4 performed reasonably well with k=5 and 6. In this paper, we employ a setup akin to SKILL-MIX to evaluate the capacity of smaller models to learn compositional generalization from examples. Utilizing a diverse set of language skills -- including rhetorical, literary, reasoning, theory of mind, and common sense -- GPT-4 was used to generate text samples that exhibit random subsets of k skills. Subsequent fine-tuning of 7B and 13B parameter models on these combined skill texts, for increasing values of k, revealed the following findings: (1) Training on combinations of k=2 and 3 skills results in noticeable improvements in the ability to compose texts with k=4 and 5 skills, despite models never having seen such examples during training. (2) When skill categories are split into training and held-out groups, models significantly improve at composing texts with held-out skills during testing despite having only seen training skills during fine-tuning, illustrating the efficacy of the training approach even with previously unseen skills. This study also suggests that incorporating skill-rich (potentially synthetic) text into training can substantially enhance the compositional capabilities of models.

モデルは例からスキルの組み合わせを学習できるか？

Can Models Learn Skill Composition from Examples?

要旨

Support