ChatPaper.aiChatPaper

利用人類靈感學習策略對大型語言模型進行微調在醫學問答中

Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering

August 15, 2024
作者: Yushi Yang, Andrew M. Bean, Robert McCraith, Adam Mahdi
cs.AI

摘要

訓練大型語言模型(LLMs)會產生相當大的與數據相關的成本,促使通過優化數據排序和選擇來開發數據高效訓練方法。人類靈感的學習策略,如課程學習,通過按照常見的人類學習實踐組織數據,提供了有效訓練的可能性。儘管有證據表明使用課程學習進行微調可以提高LLMs在自然語言理解任務中的性能,但其有效性通常是使用單個模型進行評估。在這項研究中,我們通過評估基於課程和非課程學習策略在多個LLMs上的效果,使用人工定義和自動化數據標籤進行醫學問答,擴展了先前的研究。我們的結果表明,使用人類靈感的學習策略對微調LLMs具有中等影響,每個模型最大準確度提升為1.77%,每個數據集為1.81%。至關重要的是,我們證明這些策略的有效性在不同的模型-數據集組合中存在顯著差異,強調特定人類靈感策略對微調LLMs的好處並不具有普遍性。此外,我們發現使用LLM定義的問題難度進行課程學習優於人工定義的難度,突顯了使用模型生成的度量來進行最佳課程設計的潛力。
English
Training Large Language Models (LLMs) incurs substantial data-related costs, motivating the development of data-efficient training methods through optimised data ordering and selection. Human-inspired learning strategies, such as curriculum learning, offer possibilities for efficient training by organising data according to common human learning practices. Despite evidence that fine-tuning with curriculum learning improves the performance of LLMs for natural language understanding tasks, its effectiveness is typically assessed using a single model. In this work, we extend previous research by evaluating both curriculum-based and non-curriculum-based learning strategies across multiple LLMs, using human-defined and automated data labels for medical question answering. Our results indicate a moderate impact of using human-inspired learning strategies for fine-tuning LLMs, with maximum accuracy gains of 1.77% per model and 1.81% per dataset. Crucially, we demonstrate that the effectiveness of these strategies varies significantly across different model-dataset combinations, emphasising that the benefits of a specific human-inspired strategy for fine-tuning LLMs do not generalise. Additionally, we find evidence that curriculum learning using LLM-defined question difficulty outperforms human-defined difficulty, highlighting the potential of using model-generated measures for optimal curriculum design.

Summary

AI-Generated Summary

PDF132November 26, 2024