ChatPaper.aiChatPaper

利用演示检索进行元训练,实现高效的少样本学习

Meta-training with Demonstration Retrieval for Efficient Few-shot Learning

June 30, 2023
作者: Aaron Mueller, Kanika Narang, Lambert Mathias, Qifan Wang, Hamed Firooz
cs.AI

摘要

大型语言模型在少样本自然语言处理任务上展现出令人印象深刻的结果。然而,这些模型需要大量内存和计算资源。元训练使人能够以通用领域和任务无关的方式利用较小的模型进行少样本泛化;然而,仅使用这些方法会导致模型可能没有足够的参数化或知识来快速适应各种任务。为了克服这个问题,我们提出了带演示检索的元训练,其中我们使用密集通道检索器来检索与每个示例语义相似的标记演示,以获得更多样化的监督。通过将外部知识与模型参数分离,我们可以使用元训练来训练参数高效的模型,在更多任务上实现良好的泛化。我们从UnifiedQA和CrossFit构建了一个元训练集,并提出了一个基于UnifiedQA任务的演示库。据我们所知,我们的工作是首个将检索与元训练相结合,使用DPR模型检索演示,并同时利用来自许多任务的演示,而不是随机从目标任务的训练集中抽样演示。我们的方法在问答、自然语言推理和文本分类任务(包括SQuAD、QNLI和TREC)上胜过各种有针对性的参数高效和检索增强的少样本方法。我们的方法可以在单个GPU上快速进行元训练和微调。
English
Large language models show impressive results on few-shot NLP tasks. However, these models are memory and computation-intensive. Meta-training allows one to leverage smaller models for few-shot generalization in a domain-general and task-agnostic manner; however, these methods alone results in models that may not have sufficient parameterization or knowledge to adapt quickly to a large variety of tasks. To overcome this issue, we propose meta-training with demonstration retrieval, where we use a dense passage retriever to retrieve semantically similar labeled demonstrations to each example for more varied supervision. By separating external knowledge from model parameters, we can use meta-training to train parameter-efficient models that generalize well on a larger variety of tasks. We construct a meta-training set from UnifiedQA and CrossFit, and propose a demonstration bank based on UnifiedQA tasks. To our knowledge, our work is the first to combine retrieval with meta-training, to use DPR models to retrieve demonstrations, and to leverage demonstrations from many tasks simultaneously, rather than randomly sampling demonstrations from the training set of the target task. Our approach outperforms a variety of targeted parameter-efficient and retrieval-augmented few-shot methods on QA, NLI, and text classification tasks (including SQuAD, QNLI, and TREC). Our approach can be meta-trained and fine-tuned quickly on a single GPU.
PDF60December 15, 2024