대규모 언어 모델을 훈련하여 임상 사건 예측하기

초록

종적 임상 기록은 환자가 시간에 따라 어떻게 변화하는지에 대한 풍부한 증거를 포함하고 있지만, 이러한 신호를 임상 예측을 위한 학습 감독 신호로 변환하는 것은 여전히 어려운 과제이다. 우리는 Foresight Learning을 임상 예측에 확장하여, 시간 순서로 정렬된 MIMIC-III 기록을 과거 환자 맥락, 가능한 미래 사건에 대한 자연어 질문, 그리고 이후 문서에서 확인된 레이블로 구성된 예제로 변환한다. 이 과정을 통해 약물, 시술, 장기 지원, 미생물학 및 사망률에 걸쳐 702개 입원 사례에서 6,900개의 예측 예제를 얻었다. 이러한 예제를 통해 학습된 소형 LoRA 어댑터는 프롬프트 기반의 기본 모델보다 성능이 향상되어, 예상 교정 오차를 0.1269에서 0.0398로, 브라이어 점수를 0.199에서 0.145로 감소시켰으며, 보류된 질문에 대해서는 GPT-5 점 추정치를 약간 상회하는 성능을 보였다. 이 접근법은 수작업으로 설계된 구조적 특성이나 종말점 특화 분류기 없이도 종적 기록으로부터 재사용 가능한 임상 예측 감독 신호를 가능하게 한다.

English

Longitudinal clinical notes contain rich evidence of how patients evolve over time, but converting this signal into training supervision for clinical prediction remains challenging. We extend Foresight Learning to clinical prediction by converting time-ordered MIMIC-III notes into examples consisting of past patient context, a natural-language question about a possible future event, and a label resolved from later documentation. This process yields 6,900 prediction examples from 702 admissions across medications, procedures, organ support, microbiology, and mortality. A small LoRA adapter trained on these examples improves over the prompted base model, reducing expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145, while slightly outperforming GPT-5 point estimates on held-out questions. The approach enables reusable clinical prediction supervision from longitudinal notes without hand-engineered structured features or endpoint-specific classifiers.