大規模言語モデルの蒸留による生物医学的知識抽出：薬物有害事象を事例として

要旨

GPT-4のような大規模言語モデル（LLM）は、医療応用を含む幅広いタスクにおいて顕著な能力を発揮しています。本論文では、LLMを活用して生物医学的知識のキュレーションを拡張する方法について検討します。LLMはすでに生物医学テキストの構造化において一定の能力を有していますが、自己教師あり学習を通じてタスク特化型の学生モデルに蒸留することで、既存のLLMを大幅に上回る性能向上が可能であり、コスト、効率性、ホワイトボックスモデルへのアクセスといった追加の利点も得られることがわかりました。有害事象（ADE）抽出をケーススタディとして実施しました。これは医療の質向上において重要な領域です。標準的なADE抽出評価において、GPT-3.5を蒸留したPubMedBERTモデルは、ラベル付きデータを一切使用せずに、教師ありの最先端モデルと同等の精度を達成しました。1000倍以上小規模であるにもかかわらず、蒸留モデルは教師モデルであるGPT-3.5をF1スコアで6ポイント以上、GPT-4を5ポイント以上上回りました。蒸留モデルの選択（PubMedBERT対BioGPT）やADE抽出アーキテクチャに関するアブレーション研究は、生物医学的知識抽出のベストプラクティスを明らかにしました。同様の性能向上は、遺伝子-疾患関連や保護された健康情報といった他の標準的な生物医学的知識抽出タスクにおいても蒸留によって達成され、このアプローチの可能性がさらに示されました。

English

Large language models (LLMs), such as GPT-4, have demonstrated remarkable capabilities across a wide range of tasks, including health applications. In this paper, we study how LLMs can be used to scale biomedical knowledge curation. We find that while LLMs already possess decent competency in structuring biomedical text, by distillation into a task-specific student model through self-supervised learning, substantial gains can be attained over out-of-box LLMs, with additional advantages such as cost, efficiency, and white-box model access. We conduct a case study on adverse drug event (ADE) extraction, which is an important area for improving care. On standard ADE extraction evaluation, a GPT-3.5 distilled PubMedBERT model attained comparable accuracy as supervised state-of-the-art models without using any labeled data. Despite being over 1,000 times smaller, the distilled model outperformed its teacher GPT-3.5 by over 6 absolute points in F1 and GPT-4 by over 5 absolute points. Ablation studies on distillation model choice (e.g., PubMedBERT vs BioGPT) and ADE extraction architecture shed light on best practice for biomedical knowledge extraction. Similar gains were attained by distillation for other standard biomedical knowledge extraction tasks such as gene-disease associations and protected health information, further illustrating the promise of this approach.

大規模言語モデルの蒸留による生物医学的知識抽出：薬物有害事象を事例として

Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events

要旨

Support