利用人類靈感學習策略對大型語言模型進行微調在醫學問答中

摘要

訓練大型語言模型（LLMs）會產生相當大的與數據相關的成本，促使通過優化數據排序和選擇來開發數據高效訓練方法。人類靈感的學習策略，如課程學習，通過按照常見的人類學習實踐組織數據，提供了有效訓練的可能性。儘管有證據表明使用課程學習進行微調可以提高LLMs在自然語言理解任務中的性能，但其有效性通常是使用單個模型進行評估。在這項研究中，我們通過評估基於課程和非課程學習策略在多個LLMs上的效果，使用人工定義和自動化數據標籤進行醫學問答，擴展了先前的研究。我們的結果表明，使用人類靈感的學習策略對微調LLMs具有中等影響，每個模型最大準確度提升為1.77％，每個數據集為1.81％。至關重要的是，我們證明這些策略的有效性在不同的模型-數據集組合中存在顯著差異，強調特定人類靈感策略對微調LLMs的好處並不具有普遍性。此外，我們發現使用LLM定義的問題難度進行課程學習優於人工定義的難度，突顯了使用模型生成的度量來進行最佳課程設計的潛力。

English

Training Large Language Models (LLMs) incurs substantial data-related costs, motivating the development of data-efficient training methods through optimised data ordering and selection. Human-inspired learning strategies, such as curriculum learning, offer possibilities for efficient training by organising data according to common human learning practices. Despite evidence that fine-tuning with curriculum learning improves the performance of LLMs for natural language understanding tasks, its effectiveness is typically assessed using a single model. In this work, we extend previous research by evaluating both curriculum-based and non-curriculum-based learning strategies across multiple LLMs, using human-defined and automated data labels for medical question answering. Our results indicate a moderate impact of using human-inspired learning strategies for fine-tuning LLMs, with maximum accuracy gains of 1.77% per model and 1.81% per dataset. Crucially, we demonstrate that the effectiveness of these strategies varies significantly across different model-dataset combinations, emphasising that the benefits of a specific human-inspired strategy for fine-tuning LLMs do not generalise. Additionally, we find evidence that curriculum learning using LLM-defined question difficulty outperforms human-defined difficulty, highlighting the potential of using model-generated measures for optimal curriculum design.

利用人類靈感學習策略對大型語言模型進行微調在醫學問答中

Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering

摘要

Support