ChatPaper.aiChatPaper

CheXGenBench:胸部X光合成图像保真度、隐私性与实用性的统一基准测试平台

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs

May 15, 2025
作者: Raman Dutt, Pedro Sanchez, Yongchen Yao, Steven McDonagh, Sotirios A. Tsaftaris, Timothy Hospedales
cs.AI

摘要

我们推出CheXGenBench,这是一个严格且多方面的评估框架,专为合成胸部X光片生成而设计,能够同时评估最先进的文本到图像生成模型在保真度、隐私风险及临床实用性方面的表现。尽管生成式AI在现实世界图像领域取得了快速进展,但医学领域的评估却因方法论不一致、过时的架构比较以及很少关注合成样本实际临床价值的割裂评估标准而受阻。CheXGenBench通过标准化数据划分和包含超过20项定量指标的统一评估协议,克服了这些局限,系统地分析了11种领先文本到图像架构的生成质量、潜在隐私漏洞及下游临床适用性。我们的研究结果揭示了现有评估协议中的关键低效之处,特别是在评估生成保真度方面,导致了不一致且信息不足的比较。我们的框架为医学AI社区建立了一个标准化基准,支持客观且可重复的比较,同时促进了现有及未来生成模型的无缝集成。此外,我们发布了一个高质量合成数据集SynthCheX-75K,包含由我们基准测试中表现最佳的模型(Sana 0.6B)生成的75,000张X光片,以支持这一关键领域的进一步研究。通过CheXGenBench,我们确立了新的技术前沿,并在https://raman1121.github.io/CheXGenBench/上发布了我们的框架、模型及SynthCheX-75K数据集。
English
We introduce CheXGenBench, a rigorous and multifaceted evaluation framework for synthetic chest radiograph generation that simultaneously assesses fidelity, privacy risks, and clinical utility across state-of-the-art text-to-image generative models. Despite rapid advancements in generative AI for real-world imagery, medical domain evaluations have been hindered by methodological inconsistencies, outdated architectural comparisons, and disconnected assessment criteria that rarely address the practical clinical value of synthetic samples. CheXGenBench overcomes these limitations through standardised data partitioning and a unified evaluation protocol comprising over 20 quantitative metrics that systematically analyse generation quality, potential privacy vulnerabilities, and downstream clinical applicability across 11 leading text-to-image architectures. Our results reveal critical inefficiencies in the existing evaluation protocols, particularly in assessing generative fidelity, leading to inconsistent and uninformative comparisons. Our framework establishes a standardised benchmark for the medical AI community, enabling objective and reproducible comparisons while facilitating seamless integration of both existing and future generative models. Additionally, we release a high-quality, synthetic dataset, SynthCheX-75K, comprising 75K radiographs generated by the top-performing model (Sana 0.6B) in our benchmark to support further research in this critical domain. Through CheXGenBench, we establish a new state-of-the-art and release our framework, models, and SynthCheX-75K dataset at https://raman1121.github.io/CheXGenBench/
PDF22May 19, 2025