大規模言語モデルを用いた臨床試験マッチングのスケーリング：腫瘍学におけるケーススタディ

要旨

臨床試験マッチングは、医療提供と発見における重要なプロセスである。実際には、膨大な非構造化データとスケーラビリティのない手動処理に悩まされている。本論文では、大規模言語モデル（LLMs）を用いた臨床試験マッチングのスケーリングについて、腫瘍学を焦点領域として体系的に研究する。本研究は、米国の大規模医療ネットワークでテスト展開中の臨床試験マッチングシステムに基づいている。初期の結果は有望であり、GPT-4のような最先端のLLMsは、臨床試験の詳細な適格基準を構造化し、複雑なマッチングロジック（例：ネストされたAND/OR/NOT）を抽出することができる。まだ完璧とは言えないものの、LLMsは従来の強力なベースラインを大幅に上回り、人間をループに含めた患者-試験候補のトリアージを支援する予備的なソリューションとして機能する可能性がある。また、本研究は、LLMsをエンドツーエンドの臨床試験マッチングに適用する際の重要な成長領域、特にコンテキストの制限と精度、特に縦断的な医療記録からの患者情報の構造化についても明らかにしている。

English

Clinical trial matching is a key process in health delivery and discovery. In practice, it is plagued by overwhelming unstructured data and unscalable manual processing. In this paper, we conduct a systematic study on scaling clinical trial matching using large language models (LLMs), with oncology as the focus area. Our study is grounded in a clinical trial matching system currently in test deployment at a large U.S. health network. Initial findings are promising: out of box, cutting-edge LLMs, such as GPT-4, can already structure elaborate eligibility criteria of clinical trials and extract complex matching logic (e.g., nested AND/OR/NOT). While still far from perfect, LLMs substantially outperform prior strong baselines and may serve as a preliminary solution to help triage patient-trial candidates with humans in the loop. Our study also reveals a few significant growth areas for applying LLMs to end-to-end clinical trial matching, such as context limitation and accuracy, especially in structuring patient information from longitudinal medical records.

大規模言語モデルを用いた臨床試験マッチングのスケーリング：腫瘍学におけるケーススタディ

Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology

要旨

Support