利用开放知识提升大型语言模型在任务专业领域的能力

摘要

为了培养大型语言模型（LLMs）在解决特定领域任务时的专业知识，通常需要进行专门调整，以在预期稳定输出上进行校准行为。为了避免手动准备指导数据集和培训资源带来的巨大成本，利用包括大量低秩适应（LoRA）模型和指导数据集在内的开放知识成为一个良好的起点。然而，现有的模型和数据选择方法侧重于通用能力的表现，而忽视了领域特定部署中暴露的知识差距。在本研究中，我们提出通过引入少量人工标注样本（即K-shot）来提升LLMs任务专业知识的开放知识，以弥补这种差距。具体来说，我们开发了一个高效且可扩展的流水线，以成本效益地生成任务专家，其中K-shot数据介入选择最有前途的专家候选人和与任务相关的指导。我们构建了一个混合专家（MoE）系统，以最大程度地利用多个专家之间的个体但互补的知识。我们揭示了MoE系统成功的两个关键因素，即1）遵循K-shot，2）坚持多样性。对于前者，我们确保选择真正具有K-shot问题解决能力的模型，而不是那些盲目猜测者。此外，在数据选择过程中，优先考虑与K-shot共享任务相关背景的指导。对于后者，我们强调构成专家和模型和数据选择过程中微调指导的多样性。广泛的实验结果证实了我们的方法在各种任务中利用开放知识方面优于现有方法。代码和模型将稍后发布。

English

The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas often requires special-purpose tuning with calibrated behaviors on the expected stable outputs. To avoid huge cost brought by manual preparation of instruction datasets and training resources up to hundreds of hours, the exploitation of open knowledge including a wealth of low rank adaptation (LoRA) models and instruction datasets serves as a good starting point. However, existing methods on model and data selection focus on the performance of general-purpose capabilities while neglecting the knowledge gap exposed in domain-specific deployment. In the present study, we propose to bridge such gap by introducing few human-annotated samples (i.e., K-shot) for advancing task expertise of LLMs with open knowledge. Specifically, we develop an efficient and scalable pipeline to cost-efficiently produce task experts where K-shot data intervene in selecting the most promising expert candidates and the task-relevant instructions. A mixture-of-expert (MoE) system is built to make the best use of individual-yet-complementary knowledge between multiple experts. We unveil the two keys to the success of a MoE system, 1) the abidance by K-shot, and 2) the insistence on diversity. For the former, we ensure that models that truly possess problem-solving abilities on K-shot are selected rather than those blind guessers. Besides, during data selection, instructions that share task-relevant contexts with K-shot are prioritized. For the latter, we highlight the diversity of constituting experts and that of the fine-tuning instructions throughout the model and data selection process. Extensive experimental results confirm the superiority of our approach over existing methods on utilization of open knowledge across various tasks. Codes and models will be released later.

利用开放知识提升大型语言模型在任务专业领域的能力

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

摘要

Support