使用人类启发式学习策略对大型语言模型进行微调在医学问题回答中的应用

摘要

训练大型语言模型（LLMs）会产生相当大的数据相关成本，促使开发通过优化数据排序和选择的数据高效训练方法。人类启发的学习策略，如课程学习，通过按照常见的人类学习实践组织数据，为高效训练提供可能性。尽管有证据表明使用课程学习进行微调可以提高LLMs在自然语言理解任务中的性能，但其有效性通常是通过单一模型评估的。在这项工作中，我们通过评估基于课程和非基于课程的学习策略在多个LLMs上的应用，使用人工定义和自动化数据标签进行医学问答。我们的结果表明，使用人类启发的学习策略对微调LLMs有中等影响，每个模型最大准确率提升为1.77%，每个数据集为1.81%。至关重要的是，我们证明这些策略的有效性在不同的模型-数据集组合中显著变化，强调特定人类启发策略对微调LLMs的好处并不具有普遍性。此外，我们发现使用LLM定义的问题难度进行课程学习优于人类定义的难度，突显利用模型生成的度量进行最佳课程设计的潜力。

English

Training Large Language Models (LLMs) incurs substantial data-related costs, motivating the development of data-efficient training methods through optimised data ordering and selection. Human-inspired learning strategies, such as curriculum learning, offer possibilities for efficient training by organising data according to common human learning practices. Despite evidence that fine-tuning with curriculum learning improves the performance of LLMs for natural language understanding tasks, its effectiveness is typically assessed using a single model. In this work, we extend previous research by evaluating both curriculum-based and non-curriculum-based learning strategies across multiple LLMs, using human-defined and automated data labels for medical question answering. Our results indicate a moderate impact of using human-inspired learning strategies for fine-tuning LLMs, with maximum accuracy gains of 1.77% per model and 1.81% per dataset. Crucially, we demonstrate that the effectiveness of these strategies varies significantly across different model-dataset combinations, emphasising that the benefits of a specific human-inspired strategy for fine-tuning LLMs do not generalise. Additionally, we find evidence that curriculum learning using LLM-defined question difficulty outperforms human-defined difficulty, highlighting the potential of using model-generated measures for optimal curriculum design.

使用人类启发式学习策略对大型语言模型进行微调在医学问题回答中的应用

Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering

摘要

Support