대규모 언어 모델을 활용한 임상시험 매칭 확장: 종양학 사례 연구

초록

임상 시험 매칭은 의료 서비스 제공과 연구 개발에서 핵심적인 과정입니다. 실제로 이 과정은 방대한 양의 비정형 데이터와 확장 불가능한 수동 처리로 인해 어려움을 겪고 있습니다. 본 논문에서는 대규모 언어 모델(LLMs)을 활용하여 임상 시험 매칭을 확장하는 방법에 대해 체계적으로 연구하며, 특히 종양학 분야에 초점을 맞춥니다. 이 연구는 현재 미국의 대형 의료 네트워크에서 시험 배포 중인 임상 시험 매칭 시스템을 기반으로 합니다. 초기 연구 결과는 매우 긍정적입니다: GPT-4와 같은 최첨단 LLMs는 별도의 추가 작업 없이도 임상 시험의 복잡한 자격 기준을 구조화하고 중첩된 AND/OR/NOT과 같은 복잡한 매칭 논리를 추출할 수 있습니다. 아직 완벽하지는 않지만, LLMs는 기존의 강력한 베이스라인을 크게 능가하며, 인간의 감독 하에 환자-시험 후보를 선별하는 예비 솔루션으로 활용될 가능성이 있습니다. 또한, 본 연구는 LLMs를 종단간 임상 시험 매칭에 적용하는 데 있어 몇 가지 중요한 개선 영역을 밝혀냈습니다. 특히, 장기간의 의료 기록에서 환자 정보를 구조화하는 과정에서의 컨텍스트 제한과 정확성 문제가 그 예입니다.

English

Clinical trial matching is a key process in health delivery and discovery. In practice, it is plagued by overwhelming unstructured data and unscalable manual processing. In this paper, we conduct a systematic study on scaling clinical trial matching using large language models (LLMs), with oncology as the focus area. Our study is grounded in a clinical trial matching system currently in test deployment at a large U.S. health network. Initial findings are promising: out of box, cutting-edge LLMs, such as GPT-4, can already structure elaborate eligibility criteria of clinical trials and extract complex matching logic (e.g., nested AND/OR/NOT). While still far from perfect, LLMs substantially outperform prior strong baselines and may serve as a preliminary solution to help triage patient-trial candidates with humans in the loop. Our study also reveals a few significant growth areas for applying LLMs to end-to-end clinical trial matching, such as context limitation and accuracy, especially in structuring patient information from longitudinal medical records.

대규모 언어 모델을 활용한 임상시험 매칭 확장: 종양학 사례 연구

Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology

초록

Support