超越微调：释放临床LLM的持续预训练潜力

摘要

大型语言模型（LLMs）已经展示了在转变临床应用中的重要潜力。在这项研究中，我们调查了四种技术在调整LLMs以适应临床用例方面的效力：连续预训练、指导微调、NEFTune和提示工程。我们在Mistral 7B和Mixtral 8x7B模型上应用这些方法，利用了一个包含500亿标记的大规模临床预训练数据集和一个包含5亿标记的指导微调数据集。我们在各种临床任务上的评估揭示了每种技术的影响。虽然超过2500亿标记的连续预训练本身带来了边际改进，但它为指导微调奠定了坚实基础。值得注意的是，NEFTune主要设计用于提高生成质量，但在我们的基准上意外地展现了额外的收益。复杂的提示工程方法进一步提升了性能。这些发现显示了定制微调策略的重要性，以及探索创新技术来优化LLMs在临床领域性能的重要性。

English

Large Language Models (LLMs) have demonstrated significant potential in transforming clinical applications. In this study, we investigate the efficacy of four techniques in adapting LLMs for clinical use-cases: continuous pretraining, instruct fine-tuning, NEFTune, and prompt engineering. We employ these methods on Mistral 7B and Mixtral 8x7B models, leveraging a large-scale clinical pretraining dataset of 50 billion tokens and an instruct fine-tuning dataset of 500 million tokens. Our evaluation across various clinical tasks reveals the impact of each technique. While continuous pretraining beyond 250 billion tokens yields marginal improvements on its own, it establishes a strong foundation for instruct fine-tuning. Notably, NEFTune, designed primarily to enhance generation quality, surprisingly demonstrates additional gains on our benchmark. Complex prompt engineering methods further enhance performance. These findings show the importance of tailoring fine-tuning strategies and exploring innovative techniques to optimize LLM performance in the clinical domain.

超越微调：释放临床LLM的持续预训练潜力

Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

摘要

Support