训练大型语言模型以预测临床事件
Training Large Language Models to Predict Clinical Events
May 12, 2026
作者: Benjamin Turtel, Paul Wilczewski, Kris Skotheim
cs.AI
摘要
纵向临床笔记蕴含患者随时间演变的丰富证据,但将这一信号转化为临床预测的训练监督仍然具有挑战性。我们通过将按时间排序的MIMIC-III笔记转化为包含患者既往病史、关于可能未来事件的自然语言问题以及从后续记录中解析出的标签的示例,将前瞻学习扩展到临床预测领域。这一过程从702次入院记录中生成了6,900个预测示例,涵盖用药、手术、器官支持、微生物学和死亡率等多个维度。基于这些示例训练的轻量级LoRA适配器在提示基础模型上实现了性能提升,将预期校准误差从0.1269降至0.0398,Brier分数从0.199降至0.145,同时在留出问题的点估计上略微优于GPT-5。该方法无需人工设计的结构化特征或特定终点的分类器,即可从纵向临床笔记中生成可复用的临床预测监督信号。
English
Longitudinal clinical notes contain rich evidence of how patients evolve over time, but converting this signal into training supervision for clinical prediction remains challenging. We extend Foresight Learning to clinical prediction by converting time-ordered MIMIC-III notes into examples consisting of past patient context, a natural-language question about a possible future event, and a label resolved from later documentation. This process yields 6,900 prediction examples from 702 admissions across medications, procedures, organ support, microbiology, and mortality. A small LoRA adapter trained on these examples improves over the prompted base model, reducing expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145, while slightly outperforming GPT-5 point estimates on held-out questions. The approach enables reusable clinical prediction supervision from longitudinal notes without hand-engineered structured features or endpoint-specific classifiers.