文書の不一致：検索拡張言語モデルを用いた移植ガイドラインの制度的差異の測定

要旨

米国における固形臓器移植の患者教育資料は施設間で大きなばらつきがあるが、この多様性を大規模に定量化する体系的な手法は存在しない。本研究では、検索拡張言語モデルを用いて同一の患者質問を各施設のハンドブックに基づいて回答させ、5段階の一貫性分類体系で回答を比較する枠組みを提案する。23施設102冊のハンドブックと1,115の基準質問を適用した結果、質問・トピック・臓器・施設の4次元にわたる多様性を定量化した。非欠損のペアワイズ比較の20.8%に臨床的に有意な相違が認められ、特に経過観察と生活習慣に関するトピックに集中していた。情報カバレッジの格差はさらに顕著で、質問-ハンドブックペアの96.2%が関連内容を欠いており、生殖健康分野では欠損率95.1%に達した。施設レベルの相違プロファイルは安定して解釈可能であり、この多様性は患者層の多様性に起因する体系的な施設間差を反映している。これらの知見は移植患者教育資料における情報格差を明らかにするとともに、文書に基づく医療質問応答がコンテンツ改善の機会を浮き彫りにすることを示唆している。

English

Patient education materials for solid-organ transplantation vary substantially across U.S. centers, yet no systematic method exists to quantify this heterogeneity at scale. We introduce a framework that grounds the same patient questions in different centers' handbooks using retrieval-augmented language models and compares the resulting answers using a five-label consistency taxonomy. Applied to 102 handbooks from 23 centers and 1,115 benchmark questions, the framework quantifies heterogeneity across four dimensions: question, topic, organ, and center. We find that 20.8% of non-absent pairwise comparisons exhibit clinically meaningful divergence, concentrated in condition monitoring and lifestyle topics. Coverage gaps are even more prominent: 96.2% of question-handbook pairs miss relevant content, with reproductive health at 95.1% absence. Center-level divergence profiles are stable and interpretable, where heterogeneity reflects systematic institutional differences, likely due to patient diversity. These findings expose an information gap in transplant patient education materials, with document-grounded medical question answering highlighting opportunities for content improvement.

文書の不一致：検索拡張言語モデルを用いた移植ガイドラインの制度的差異の測定

When Documents Disagree: Measuring Institutional Variation in Transplant Guidance with Retrieval-Augmented Language Models

要旨

Support