TRIAGE：使用大型語言模型對不規則取樣醫療時間序列進行可解釋風險預測的辯證推理

摘要

基於電子健康記錄建立的臨床預警系統（其中臨床觀察紀錄為不規則取樣醫療時間序列，ISMTS）必須提供經校正的風險評分以進行病患分流，同時提供臨床醫師可驗證的可解釋理由。大語言模型（LLMs）已被探索應用於此任務，但它們將分級臨床風險壓縮為過度自信的二元預測。這種風險極化損害了校正能力與跨病患可比性。為解決此問題，我們提出TRIAGE框架，該框架訓練LLM透過引出結果特定理由來產生針對競爭性臨床結果的辯證推理。這種辯證表述可減輕風險極化，使單一LLM能產生基於明確臨床推理的連續風險評分。在三個ISMTS基準測試中，TRIAGE與競爭基線相比，平均AUPRC提升了3.3%，並將校正誤差降低了81%。LLM作為評判的評估進一步顯示，我們的理由在臨床推理品質上比基線的後設解釋高出20%。原始碼可在 https://github.com/HyeongWon-Jang/TRIAGE 取得。

English

Clinical early warning systems built on electronic health records, in which clinical observations are recorded as irregularly sampled medical time series (ISMTS), must deliver both calibrated risk scores for patient triage and interpretable rationales that clinicians can verify. Large Language Models (LLMs) have been explored for this task, yet they collapse graded clinical risk into overconfident binary predictions. This risk polarization undermines both calibration and cross-patient comparability. To address this, we propose TRIAGE, a framework that trains an LLM to generate dialectical reasoning over competing clinical outcomes by eliciting outcome-specific rationales. This dialectical formulation mitigates risk polarization, enabling a single LLM to yield continuous risk scores grounded in explicit clinical reasoning. Evaluated on three ISMTS benchmarks, TRIAGE achieves an average AUPRC improvement of 3.3% and reduces calibration error by 81% compared to the competitive baselines. An LLM-as-a-judge assessment further shows that our rationales surpass post-hoc explanations from the baseline by 20% in clinical reasoning quality. The source code is available at https://github.com/HyeongWon-Jang/TRIAGE .