ファインチューニングを超えて：臨床LLMの連続事前学習の潜在能力を引き出す

要旨

大規模言語モデル（LLMs）は、臨床応用を変革する上で著しい潜在能力を示しています。本研究では、LLMsを臨床応用ケースに適応させるための4つの技術の有効性を調査します。それらの技術は、連続事前学習、インストラクトファインチューニング、NEFTune、およびプロンプトエンジニアリングです。私たちは、Mistral 7BとMixtral 8x7Bモデルにこれらの手法を適用し、500億トークンの大規模な臨床事前学習データセットと5億トークンのインストラクトファインチューニングデータセットを活用します。様々な臨床タスクでの評価により、各技術の影響が明らかになります。2500億トークンを超える連続事前学習は単独ではわずかな改善しかもたらしませんが、インストラクトファインチューニングの強力な基盤を築きます。特に、主に生成品質を向上させるために設計されたNEFTuneは、私たちのベンチマークで追加の利益を驚くほど示します。複雑なプロンプトエンジニアリング手法は、パフォーマンスをさらに向上させます。これらの知見は、ファインチューニング戦略を適合させ、革新的な技術を探求することが、臨床領域におけるLLMのパフォーマンスを最適化する上で重要であることを示しています。

English

Large Language Models (LLMs) have demonstrated significant potential in transforming clinical applications. In this study, we investigate the efficacy of four techniques in adapting LLMs for clinical use-cases: continuous pretraining, instruct fine-tuning, NEFTune, and prompt engineering. We employ these methods on Mistral 7B and Mixtral 8x7B models, leveraging a large-scale clinical pretraining dataset of 50 billion tokens and an instruct fine-tuning dataset of 500 million tokens. Our evaluation across various clinical tasks reveals the impact of each technique. While continuous pretraining beyond 250 billion tokens yields marginal improvements on its own, it establishes a strong foundation for instruct fine-tuning. Notably, NEFTune, designed primarily to enhance generation quality, surprisingly demonstrates additional gains on our benchmark. Complex prompt engineering methods further enhance performance. These findings show the importance of tailoring fine-tuning strategies and exploring innovative techniques to optimize LLM performance in the clinical domain.

ファインチューニングを超えて：臨床LLMの連続事前学習の潜在能力を引き出す

Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

要旨

Support