U-Bench: 100가지 변형 벤치마킹을 통한 U-Net의 포괄적 이해

초록

지난 10년 동안 U-Net은 의료 영상 분할 분야에서 주도적인 아키텍처로 자리 잡았으며, 수천 가지의 U자형 변형 모델이 개발되었습니다. 그러나 이러한 모델들이 널리 채택되었음에도 불구하고, 이들의 성능과 유용성을 체계적으로 평가하기 위한 포괄적인 벤치마크는 여전히 부재한 상황입니다. 이는 주로 충분하지 않은 통계적 검증과 다양한 데이터셋 간의 효율성 및 일반화 능력에 대한 제한된 고려 때문입니다. 이러한 격차를 해소하기 위해, 우리는 28개의 데이터셋과 10가지 영상 모달리티에 걸쳐 100개의 U-Net 변형 모델을 평가한 첫 번째 대규모 통계적 벤치마크인 U-Bench를 소개합니다. 우리의 기여는 세 가지로 요약됩니다: (1) 포괄적 평가: U-Bench는 통계적 견고성, 제로샷 일반화 능력, 계산 효율성이라는 세 가지 핵심 차원에서 모델을 평가합니다. 우리는 성능과 효율성 간의 균형을 포괄적으로 반영한 새로운 메트릭인 U-Score를 도입하여, 모델의 발전을 배포 지향적 관점에서 평가합니다. (2) 체계적 분석 및 모델 선택 가이드: 대규모 평가에서 도출된 주요 결과를 요약하고, 데이터셋 특성과 아키텍처 패러다임이 모델 성능에 미치는 영향을 체계적으로 분석합니다. 이러한 통찰을 바탕으로, 특정 데이터셋과 작업에 가장 적합한 모델을 선택할 수 있도록 연구자들을 안내하는 모델 어드바이저 에이전트를 제안합니다. (3) 공개적 접근성: 모든 코드, 모델, 프로토콜 및 가중치를 제공하여, 커뮤니티가 우리의 결과를 재현하고 향후 방법론으로 벤치마크를 확장할 수 있도록 합니다. 요약하자면, U-Bench는 이전 평가의 격차를 드러내는 동시에, 향후 10년간 U-Net 기반 분할 모델에 대한 공정하고 재현 가능하며 실질적으로 관련성 높은 벤치마킹의 기반을 마련합니다. 이 프로젝트는 https://fenghetan9.github.io/ubench에서 접근할 수 있으며, 코드는 https://github.com/FengheTan9/U-Bench에서 이용 가능합니다.

English

Over the past decade, U-Net has been the dominant architecture in medical image segmentation, leading to the development of thousands of U-shaped variants. Despite its widespread adoption, there is still no comprehensive benchmark to systematically evaluate their performance and utility, largely because of insufficient statistical validation and limited consideration of efficiency and generalization across diverse datasets. To bridge this gap, we present U-Bench, the first large-scale, statistically rigorous benchmark that evaluates 100 U-Net variants across 28 datasets and 10 imaging modalities. Our contributions are threefold: (1) Comprehensive Evaluation: U-Bench evaluates models along three key dimensions: statistical robustness, zero-shot generalization, and computational efficiency. We introduce a novel metric, U-Score, which jointly captures the performance-efficiency trade-off, offering a deployment-oriented perspective on model progress. (2) Systematic Analysis and Model Selection Guidance: We summarize key findings from the large-scale evaluation and systematically analyze the impact of dataset characteristics and architectural paradigms on model performance. Based on these insights, we propose a model advisor agent to guide researchers in selecting the most suitable models for specific datasets and tasks. (3) Public Availability: We provide all code, models, protocols, and weights, enabling the community to reproduce our results and extend the benchmark with future methods. In summary, U-Bench not only exposes gaps in previous evaluations but also establishes a foundation for fair, reproducible, and practically relevant benchmarking in the next decade of U-Net-based segmentation models. The project can be accessed at: https://fenghetan9.github.io/ubench. Code is available at: https://github.com/FengheTan9/U-Bench.

U-Bench: 100가지 변형 벤치마킹을 통한 U-Net의 포괄적 이해

U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking

초록

Support