当文件意见相左:基于检索增强语言模型的移植指南机构差异量化研究
When Documents Disagree: Measuring Institutional Variation in Transplant Guidance with Retrieval-Augmented Language Models
March 23, 2026
作者: Yubo Li, Ramayya Krishnan, Rema Padman
cs.AI
摘要
美国各移植中心的实体器官移植患者教育材料存在显著差异,但目前缺乏系统性方法来量化这种差异规模。我们开发了一个框架,通过检索增强语言模型将相同患者问题与不同中心手册进行关联,并采用五级一致性分类法比较答案差异。该框架应用于23个中心的102份手册和1,115个基准问题,从问题、主题、器官和中心四个维度量化异质性。研究发现,20.8%的非空缺配对比较呈现临床显著性差异,主要集中在状况监测和生活方式主题。内容覆盖缺失更为突出:96.2%的问题-手册配对存在相关内容缺失,其中生殖健康主题缺失率高达95.1%。中心级差异特征具有稳定性和可解释性,异质性反映了系统性机构差异,可能源于患者多样性。这些发现揭示了移植患者教育材料存在信息鸿沟,基于文档的医疗问答系统为内容改进指明了方向。
English
Patient education materials for solid-organ transplantation vary substantially across U.S. centers, yet no systematic method exists to quantify this heterogeneity at scale. We introduce a framework that grounds the same patient questions in different centers' handbooks using retrieval-augmented language models and compares the resulting answers using a five-label consistency taxonomy. Applied to 102 handbooks from 23 centers and 1,115 benchmark questions, the framework quantifies heterogeneity across four dimensions: question, topic, organ, and center. We find that 20.8% of non-absent pairwise comparisons exhibit clinically meaningful divergence, concentrated in condition monitoring and lifestyle topics. Coverage gaps are even more prominent: 96.2% of question-handbook pairs miss relevant content, with reproductive health at 95.1% absence. Center-level divergence profiles are stable and interpretable, where heterogeneity reflects systematic institutional differences, likely due to patient diversity. These findings expose an information gap in transplant patient education materials, with document-grounded medical question answering highlighting opportunities for content improvement.