U-Bench: U-Netの包括的理解のための100バリアントベンチマーキング

要旨

過去10年間、U-Netは医療画像セグメンテーションにおいて支配的なアーキテクチャであり、数千ものU字型バリアントの開発を牽引してきました。しかし、その広範な採用にもかかわらず、統計的検証の不十分さや、多様なデータセットにおける効率性と汎化性能の考慮が限られているため、それらの性能と有用性を体系的に評価する包括的なベンチマークはまだ存在していません。このギャップを埋めるため、我々はU-Benchを提案します。これは、28のデータセットと10の画像モダリティにわたって100のU-Netバリアントを評価する、初の大規模かつ統計的に厳密なベンチマークです。我々の貢献は以下の3点です：(1) 包括的評価：U-Benchは、統計的ロバスト性、ゼロショット汎化、計算効率という3つの主要な次元に沿ってモデルを評価します。性能と効率のトレードオフを同時に捉える新しい指標であるU-Scoreを導入し、モデルの進歩をデプロイメント指向の視点で評価します。(2) 体系的分析とモデル選択ガイダンス：大規模評価から得られた主要な知見をまとめ、データセット特性とアーキテクチャパラダイムがモデル性能に与える影響を体系的に分析します。これらの洞察に基づき、特定のデータセットとタスクに最適なモデルを研究者が選択するためのモデルアドバイザーエージェントを提案します。(3) 公開性：すべてのコード、モデル、プロトコル、重みを提供し、コミュニティが我々の結果を再現し、将来の手法でベンチマークを拡張できるようにします。要約すると、U-Benchは、過去の評価におけるギャップを明らかにするだけでなく、今後10年間のU-Netベースのセグメンテーションモデルにおいて、公平で再現性があり、実践的に意味のあるベンチマークの基盤を確立します。プロジェクトは以下でアクセス可能です：https://fenghetan9.github.io/ubench。コードは以下で利用可能です：https://github.com/FengheTan9/U-Bench。

English

Over the past decade, U-Net has been the dominant architecture in medical image segmentation, leading to the development of thousands of U-shaped variants. Despite its widespread adoption, there is still no comprehensive benchmark to systematically evaluate their performance and utility, largely because of insufficient statistical validation and limited consideration of efficiency and generalization across diverse datasets. To bridge this gap, we present U-Bench, the first large-scale, statistically rigorous benchmark that evaluates 100 U-Net variants across 28 datasets and 10 imaging modalities. Our contributions are threefold: (1) Comprehensive Evaluation: U-Bench evaluates models along three key dimensions: statistical robustness, zero-shot generalization, and computational efficiency. We introduce a novel metric, U-Score, which jointly captures the performance-efficiency trade-off, offering a deployment-oriented perspective on model progress. (2) Systematic Analysis and Model Selection Guidance: We summarize key findings from the large-scale evaluation and systematically analyze the impact of dataset characteristics and architectural paradigms on model performance. Based on these insights, we propose a model advisor agent to guide researchers in selecting the most suitable models for specific datasets and tasks. (3) Public Availability: We provide all code, models, protocols, and weights, enabling the community to reproduce our results and extend the benchmark with future methods. In summary, U-Bench not only exposes gaps in previous evaluations but also establishes a foundation for fair, reproducible, and practically relevant benchmarking in the next decade of U-Net-based segmentation models. The project can be accessed at: https://fenghetan9.github.io/ubench. Code is available at: https://github.com/FengheTan9/U-Bench.

U-Bench: U-Netの包括的理解のための100バリアントベンチマーキング

U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking

要旨

Support