U-Bench:通过百变体基准测试全面解析U-Net
U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking
October 8, 2025
作者: Fenghe Tang, Chengqi Dong, Wenxin Ma, Zikang Xu, Heqin Zhu, Zihang Jiang, Rongsheng Wang, Yuhao Wang, Chenxu Wu, Shaohua Kevin Zhou
cs.AI
摘要
过去十年间,U-Net 架构在医学图像分割领域占据主导地位,催生了数千种 U 型变体的发展。尽管其应用广泛,但目前仍缺乏一个全面的基准来系统评估这些变体的性能与实用性,主要原因在于统计验证不足以及对跨数据集效率和泛化能力的考量有限。为填补这一空白,我们推出了 U-Bench,这是首个大规模、统计严谨的基准测试,评估了 100 种 U-Net 变体在 28 个数据集和 10 种成像模态下的表现。我们的贡献体现在三个方面:(1) 全面评估:U-Bench 从统计鲁棒性、零样本泛化能力和计算效率三个关键维度评估模型。我们引入了一个新指标——U-Score,该指标综合捕捉性能与效率的权衡,为模型进展提供了部署导向的视角。(2) 系统分析与模型选择指导:我们总结了大规模评估中的关键发现,并系统分析了数据集特性和架构范式对模型性能的影响。基于这些洞察,我们提出了一个模型顾问代理,以指导研究人员为特定数据集和任务选择最合适的模型。(3) 公开可用性:我们提供了所有代码、模型、协议和权重,使社区能够复现我们的结果,并将基准测试扩展到未来的方法中。总之,U-Bench 不仅揭示了以往评估中的不足,还为下一个十年基于 U-Net 的分割模型奠定了公平、可重复且实际相关的基准测试基础。项目访问地址:https://fenghetan9.github.io/ubench。代码获取地址:https://github.com/FengheTan9/U-Bench。
English
Over the past decade, U-Net has been the dominant architecture in medical
image segmentation, leading to the development of thousands of U-shaped
variants. Despite its widespread adoption, there is still no comprehensive
benchmark to systematically evaluate their performance and utility, largely
because of insufficient statistical validation and limited consideration of
efficiency and generalization across diverse datasets. To bridge this gap, we
present U-Bench, the first large-scale, statistically rigorous benchmark that
evaluates 100 U-Net variants across 28 datasets and 10 imaging modalities. Our
contributions are threefold: (1) Comprehensive Evaluation: U-Bench evaluates
models along three key dimensions: statistical robustness, zero-shot
generalization, and computational efficiency. We introduce a novel metric,
U-Score, which jointly captures the performance-efficiency trade-off, offering
a deployment-oriented perspective on model progress. (2) Systematic Analysis
and Model Selection Guidance: We summarize key findings from the large-scale
evaluation and systematically analyze the impact of dataset characteristics and
architectural paradigms on model performance. Based on these insights, we
propose a model advisor agent to guide researchers in selecting the most
suitable models for specific datasets and tasks. (3) Public Availability: We
provide all code, models, protocols, and weights, enabling the community to
reproduce our results and extend the benchmark with future methods. In summary,
U-Bench not only exposes gaps in previous evaluations but also establishes a
foundation for fair, reproducible, and practically relevant benchmarking in the
next decade of U-Net-based segmentation models. The project can be accessed at:
https://fenghetan9.github.io/ubench. Code is available at:
https://github.com/FengheTan9/U-Bench.