超越微调:释放临床LLM的持续预训练潜力
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs
September 23, 2024
作者: Clément Christophe, Tathagata Raha, Svetlana Maslenkova, Muhammad Umar Salman, Praveen K Kanithi, Marco AF Pimentel, Shadab Khan
cs.AI
摘要
大型语言模型(LLMs)已经展示了在转变临床应用中的重要潜力。在这项研究中,我们调查了四种技术在调整LLMs以适应临床用例方面的效力:连续预训练、指导微调、NEFTune和提示工程。我们在Mistral 7B和Mixtral 8x7B模型上应用这些方法,利用了一个包含500亿标记的大规模临床预训练数据集和一个包含5亿标记的指导微调数据集。我们在各种临床任务上的评估揭示了每种技术的影响。虽然超过2500亿标记的连续预训练本身带来了边际改进,但它为指导微调奠定了坚实基础。值得注意的是,NEFTune主要设计用于提高生成质量,但在我们的基准上意外地展现了额外的收益。复杂的提示工程方法进一步提升了性能。这些发现显示了定制微调策略的重要性,以及探索创新技术来优化LLMs在临床领域性能的重要性。
English
Large Language Models (LLMs) have demonstrated significant potential in
transforming clinical applications. In this study, we investigate the efficacy
of four techniques in adapting LLMs for clinical use-cases: continuous
pretraining, instruct fine-tuning, NEFTune, and prompt engineering. We employ
these methods on Mistral 7B and Mixtral 8x7B models, leveraging a large-scale
clinical pretraining dataset of 50 billion tokens and an instruct fine-tuning
dataset of 500 million tokens. Our evaluation across various clinical tasks
reveals the impact of each technique. While continuous pretraining beyond 250
billion tokens yields marginal improvements on its own, it establishes a strong
foundation for instruct fine-tuning. Notably, NEFTune, designed primarily to
enhance generation quality, surprisingly demonstrates additional gains on our
benchmark. Complex prompt engineering methods further enhance performance.
These findings show the importance of tailoring fine-tuning strategies and
exploring innovative techniques to optimize LLM performance in the clinical
domain.Summary
AI-Generated Summary