CheXGenBench：合成胸部X線画像の忠実度、プライバシー、有用性を評価する統合ベンチマーク

要旨

私たちは、合成胸部X線画像生成のための厳密で多面的な評価フレームワークであるCheXGenBenchを紹介します。このフレームワークは、最先端のテキストから画像への生成モデルにおいて、忠実度、プライバシーリスク、臨床的有用性を同時に評価します。現実世界の画像に対する生成AIの急速な進歩にもかかわらず、医療分野の評価は、方法論の不整合、時代遅れのアーキテクチャ比較、そして合成サンプルの実用的な臨床的価値をほとんど考慮しない断片的な評価基準によって妨げられてきました。CheXGenBenchは、標準化されたデータ分割と、生成品質、潜在的なプライバシーの脆弱性、および11の主要なテキストから画像へのアーキテクチャにわたる下流の臨床的適用性を体系的に分析する20以上の定量的メトリクスを含む統一された評価プロトコルを通じて、これらの制限を克服します。私たちの結果は、特に生成忠実度の評価において、既存の評価プロトコルに重大な非効率性があることを明らかにし、一貫性のない情報量の少ない比較を引き起こしています。私たちのフレームワークは、医療AIコミュニティのための標準化されたベンチマークを確立し、客観的で再現可能な比較を可能にするとともに、既存および将来の生成モデルのシームレスな統合を促進します。さらに、私たちは、ベンチマークで最高のパフォーマンスを示したモデル（Sana 0.6B）によって生成された75,000枚のX線画像を含む高品質の合成データセット、SynthCheX-75Kをリリースし、この重要な分野でのさらなる研究を支援します。CheXGenBenchを通じて、私たちは新しい最先端を確立し、フレームワーク、モデル、およびSynthCheX-75Kデータセットをhttps://raman1121.github.io/CheXGenBench/で公開します。

English

We introduce CheXGenBench, a rigorous and multifaceted evaluation framework for synthetic chest radiograph generation that simultaneously assesses fidelity, privacy risks, and clinical utility across state-of-the-art text-to-image generative models. Despite rapid advancements in generative AI for real-world imagery, medical domain evaluations have been hindered by methodological inconsistencies, outdated architectural comparisons, and disconnected assessment criteria that rarely address the practical clinical value of synthetic samples. CheXGenBench overcomes these limitations through standardised data partitioning and a unified evaluation protocol comprising over 20 quantitative metrics that systematically analyse generation quality, potential privacy vulnerabilities, and downstream clinical applicability across 11 leading text-to-image architectures. Our results reveal critical inefficiencies in the existing evaluation protocols, particularly in assessing generative fidelity, leading to inconsistent and uninformative comparisons. Our framework establishes a standardised benchmark for the medical AI community, enabling objective and reproducible comparisons while facilitating seamless integration of both existing and future generative models. Additionally, we release a high-quality, synthetic dataset, SynthCheX-75K, comprising 75K radiographs generated by the top-performing model (Sana 0.6B) in our benchmark to support further research in this critical domain. Through CheXGenBench, we establish a new state-of-the-art and release our framework, models, and SynthCheX-75K dataset at https://raman1121.github.io/CheXGenBench/

CheXGenBench：合成胸部X線画像の忠実度、プライバシー、有用性を評価する統合ベンチマーク

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs

要旨

Support