ChatPaper.aiChatPaper

CheXGenBench:一個用於評估合成胸部X光片保真度、隱私性與實用性的統一基準

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs

May 15, 2025
作者: Raman Dutt, Pedro Sanchez, Yongchen Yao, Steven McDonagh, Sotirios A. Tsaftaris, Timothy Hospedales
cs.AI

摘要

我們推出CheXGenBench,這是一個嚴謹且多面向的評估框架,用於合成胸部X光片的生成,同時評估最先進的文本到圖像生成模型在逼真度、隱私風險和臨床實用性方面的表現。儘管生成式AI在現實世界圖像領域取得了快速進展,但醫學領域的評估一直受到方法學不一致、過時的架構比較以及評估標準脫節的阻礙,這些標準很少涉及合成樣本的實際臨床價值。CheXGenBench通過標準化的數據分割和包含超過20個量化指標的統一評估協議,克服了這些限制,系統地分析了11種領先的文本到圖像架構的生成質量、潛在隱私漏洞以及下游臨床應用性。我們的結果揭示了現有評估協議中的關鍵低效性,特別是在評估生成逼真度方面,導致了不一致且無信息量的比較。我們的框架為醫學AI社區建立了一個標準化的基準,使客觀且可重現的比較成為可能,同時促進了現有和未來生成模型的無縫整合。此外,我們發布了一個高質量的合成數據集SynthCheX-75K,包含75,000張由我們基準測試中表現最佳的模型(Sana 0.6B)生成的X光片,以支持這一關鍵領域的進一步研究。通過CheXGenBench,我們確立了新的最先進水平,並在https://raman1121.github.io/CheXGenBench/上發布了我們的框架、模型和SynthCheX-75K數據集。
English
We introduce CheXGenBench, a rigorous and multifaceted evaluation framework for synthetic chest radiograph generation that simultaneously assesses fidelity, privacy risks, and clinical utility across state-of-the-art text-to-image generative models. Despite rapid advancements in generative AI for real-world imagery, medical domain evaluations have been hindered by methodological inconsistencies, outdated architectural comparisons, and disconnected assessment criteria that rarely address the practical clinical value of synthetic samples. CheXGenBench overcomes these limitations through standardised data partitioning and a unified evaluation protocol comprising over 20 quantitative metrics that systematically analyse generation quality, potential privacy vulnerabilities, and downstream clinical applicability across 11 leading text-to-image architectures. Our results reveal critical inefficiencies in the existing evaluation protocols, particularly in assessing generative fidelity, leading to inconsistent and uninformative comparisons. Our framework establishes a standardised benchmark for the medical AI community, enabling objective and reproducible comparisons while facilitating seamless integration of both existing and future generative models. Additionally, we release a high-quality, synthetic dataset, SynthCheX-75K, comprising 75K radiographs generated by the top-performing model (Sana 0.6B) in our benchmark to support further research in this critical domain. Through CheXGenBench, we establish a new state-of-the-art and release our framework, models, and SynthCheX-75K dataset at https://raman1121.github.io/CheXGenBench/

Summary

AI-Generated Summary

PDF22May 19, 2025