ChatPaper.aiChatPaper

模型能否从示例中学习技能组合?

Can Models Learn Skill Composition from Examples?

September 29, 2024
作者: Haoyu Zhao, Simran Kaur, Dingli Yu, Anirudh Goyal, Sanjeev Arora
cs.AI

摘要

随着大型语言模型(LLMs)变得日益先进,它们展示组合泛化的能力——即在训练过程中未曾遇到的新颖方式中结合学习技能的能力,引起了广泛关注。这种泛化类型,在超出训练数据的情境中尤其引人关注,也在研究人工智能安全性和对齐性方面备受关注。最近的一项研究引入了SKILL-MIX评估,其中模型被要求撰写一个简短段落,展示使用指定的k元语言技能组合。尽管小型模型在k=3时甚至难以组合,但像GPT-4这样的大型模型在k=5和6时表现相当不错。 在本文中,我们采用类似于SKILL-MIX的设置来评估较小模型从示例中学习组合泛化的能力。利用多样的语言技能集,包括修辞、文学、推理、心灵理论和常识,我们使用GPT-4生成展示k个技能随机子集的文本样本。随后,在这些组合技能文本上对7B和13B参数模型进行微调,针对不断增加的k值,揭示了以下发现:(1)在k=2和3技能组合上进行训练显著提高了撰写具有k=4和5技能的文本的能力,尽管模型在训练过程中从未见过这样的示例。(2)当技能类别分为训练组和保留组时,模型在测试过程中显著改善了撰写具有保留技能的文本的能力,尽管在微调过程中只见过训练技能,说明了训练方法的有效性,即使是对以前未见过的技能。这项研究还表明,将技能丰富(可能是合成的)文本纳入训练中可以显著增强模型的组合能力。
English
As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization -- the capacity to combine learned skills in novel ways not encountered during training -- has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment. A recent study introduced the SKILL-MIX evaluation, where models are tasked with composing a short paragraph demonstrating the use of a specified k-tuple of language skills. While small models struggled with composing even with k=3, larger models like GPT-4 performed reasonably well with k=5 and 6. In this paper, we employ a setup akin to SKILL-MIX to evaluate the capacity of smaller models to learn compositional generalization from examples. Utilizing a diverse set of language skills -- including rhetorical, literary, reasoning, theory of mind, and common sense -- GPT-4 was used to generate text samples that exhibit random subsets of k skills. Subsequent fine-tuning of 7B and 13B parameter models on these combined skill texts, for increasing values of k, revealed the following findings: (1) Training on combinations of k=2 and 3 skills results in noticeable improvements in the ability to compose texts with k=4 and 5 skills, despite models never having seen such examples during training. (2) When skill categories are split into training and held-out groups, models significantly improve at composing texts with held-out skills during testing despite having only seen training skills during fine-tuning, illustrating the efficacy of the training approach even with previously unseen skills. This study also suggests that incorporating skill-rich (potentially synthetic) text into training can substantially enhance the compositional capabilities of models.

Summary

AI-Generated Summary

PDF102November 13, 2024