模型能否从示例中学习技能组合？

摘要

随着大型语言模型（LLMs）变得日益先进，它们展示组合泛化的能力——即在训练过程中未曾遇到的新颖方式中结合学习技能的能力，引起了广泛关注。这种泛化类型，在超出训练数据的情境中尤其引人关注，也在研究人工智能安全性和对齐性方面备受关注。最近的一项研究引入了SKILL-MIX评估，其中模型被要求撰写一个简短段落，展示使用指定的k元语言技能组合。尽管小型模型在k=3时甚至难以组合，但像GPT-4这样的大型模型在k=5和6时表现相当不错。在本文中，我们采用类似于SKILL-MIX的设置来评估较小模型从示例中学习组合泛化的能力。利用多样的语言技能集，包括修辞、文学、推理、心灵理论和常识，我们使用GPT-4生成展示k个技能随机子集的文本样本。随后，在这些组合技能文本上对7B和13B参数模型进行微调，针对不断增加的k值，揭示了以下发现：（1）在k=2和3技能组合上进行训练显著提高了撰写具有k=4和5技能的文本的能力，尽管模型在训练过程中从未见过这样的示例。（2）当技能类别分为训练组和保留组时，模型在测试过程中显著改善了撰写具有保留技能的文本的能力，尽管在微调过程中只见过训练技能，说明了训练方法的有效性，即使是对以前未见过的技能。这项研究还表明，将技能丰富（可能是合成的）文本纳入训练中可以显著增强模型的组合能力。

English

As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization -- the capacity to combine learned skills in novel ways not encountered during training -- has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment. A recent study introduced the SKILL-MIX evaluation, where models are tasked with composing a short paragraph demonstrating the use of a specified k-tuple of language skills. While small models struggled with composing even with k=3, larger models like GPT-4 performed reasonably well with k=5 and 6. In this paper, we employ a setup akin to SKILL-MIX to evaluate the capacity of smaller models to learn compositional generalization from examples. Utilizing a diverse set of language skills -- including rhetorical, literary, reasoning, theory of mind, and common sense -- GPT-4 was used to generate text samples that exhibit random subsets of k skills. Subsequent fine-tuning of 7B and 13B parameter models on these combined skill texts, for increasing values of k, revealed the following findings: (1) Training on combinations of k=2 and 3 skills results in noticeable improvements in the ability to compose texts with k=4 and 5 skills, despite models never having seen such examples during training. (2) When skill categories are split into training and held-out groups, models significantly improve at composing texts with held-out skills during testing despite having only seen training skills during fine-tuning, illustrating the efficacy of the training approach even with previously unseen skills. This study also suggests that incorporating skill-rich (potentially synthetic) text into training can substantially enhance the compositional capabilities of models.

模型能否从示例中学习技能组合？

Can Models Learn Skill Composition from Examples?

摘要

Support