Lunguage:結構化與序列化胸部X光解讀的基準測試
Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation
May 27, 2025
作者: Jong Hak Moon, Geon Choi, Paloma Rabaey, Min Gwan Kim, Hyuk Gi Hong, Jung-Oh Lee, Hangyul Yoon, Eun Woo Doe, Jiyoun Kim, Harshita Sharma, Daniel C. Castro, Javier Alvarez-Valle, Edward Choi
cs.AI
摘要
放射學報告傳達了詳細的臨床觀察,並捕捉了隨時間演變的診斷推理。然而,現有的評估方法僅限於單一報告的情境,且依賴於粗糙的指標,無法捕捉細粒度的臨床語義和時間依賴性。我們引入了LUNGUAGE,這是一個用於結構化放射學報告生成的基準數據集,它支持單一報告評估和跨多項研究的縱向患者層面評估。該數據集包含1,473份經過專家審閱的胸部X光報告,其中80份包含縱向註釋,以捕捉疾病進展和研究間隔,這些註釋也經過了專家審閱。利用這一基準,我們開發了一個兩階段框架,將生成的報告轉化為細粒度、與模式對齊的結構化表示,從而實現縱向解釋。我們還提出了LUNGUAGESCORE,這是一個可解釋的指標,它在實體、關係和屬性層面比較結構化輸出,同時建模患者時間線上的時間一致性。這些貢獻建立了順序放射學報告的首個基準數據集、結構化框架和評估指標,實證結果表明LUNGUAGESCORE有效地支持了結構化報告的評估。代碼可於以下網址獲取:https://github.com/SuperSupermoon/Lunguage
English
Radiology reports convey detailed clinical observations and capture
diagnostic reasoning that evolves over time. However, existing evaluation
methods are limited to single-report settings and rely on coarse metrics that
fail to capture fine-grained clinical semantics and temporal dependencies. We
introduce LUNGUAGE,a benchmark dataset for structured radiology report
generation that supports both single-report evaluation and longitudinal
patient-level assessment across multiple studies. It contains 1,473 annotated
chest X-ray reports, each reviewed by experts, and 80 of them contain
longitudinal annotations to capture disease progression and inter-study
intervals, also reviewed by experts. Using this benchmark, we develop a
two-stage framework that transforms generated reports into fine-grained,
schema-aligned structured representations, enabling longitudinal
interpretation. We also propose LUNGUAGESCORE, an interpretable metric that
compares structured outputs at the entity, relation, and attribute level while
modeling temporal consistency across patient timelines. These contributions
establish the first benchmark dataset, structuring framework, and evaluation
metric for sequential radiology reporting, with empirical results demonstrating
that LUNGUAGESCORE effectively supports structured report evaluation. The code
is available at: https://github.com/SuperSupermoon/LunguageSummary
AI-Generated Summary