评估胰腺导管腺癌血管侵犯:PDACVI基准测试
Assessing Pancreatic Ductal Adenocarcinoma Vascular Invasion: the PDACVI Benchmark
April 30, 2026
作者: M. Riera-Marín, O. K. Sikha, J. Rodríguez-Comas, M. S. May, T. Kirscher, X. Coubez, P. Meyer, S. Faisan, Z. Pan, X. Zhou, X. Liang, C. Hémon, V. Boussot, J. -L. Dillenseger, J. -C. Nunes, K. -C. Kahl, C. Lüth, J. Traub, P. -H. Conze, M. M. Duh, A. Aubanell, R. de Figueiredo Cardoso, S. Egger-Hackenschmidt, J. García-López, M. A. González-Ballester, A. Galdran
cs.AI
摘要
胰腺导管腺癌(PDAC)的手术切除仍是目前唯一可能实现根治的治疗方案,其手术适应症取决于血管侵犯(VI)的精准评估,即肿瘤向邻近关键血管的侵犯程度。尽管VI评估对术前分期和手术规划至关重要,但其计算方法研究仍处于探索不足的状态。这主要面临两大挑战:公开数据集的缺乏以及肿瘤-血管界面诊断的模糊性,即使资深影像学专家间也存在显著的判定差异。为突破这些局限,我们推出CURVAS-PDACVI数据集与挑战赛——一个基于密集标注数据集(每例扫描包含五位专家独立标注)的PDAC分期不确定性感知人工智能开放基准平台。同时,我们提出超越空间重叠度的多维度评估框架,涵盖概率校准与VI评估功能。对六种前沿方法的评估表明,强大的整体体积重叠度未必能转化为临床关键肿瘤-血管界面的可靠性能。特别是针对二值分割优化的方法虽在平均重叠度指标上表现优异,但在专家共识度低的高复杂度病例中常出现体积坍缩或边界过度扩展的问题。相比之下,能建模评估者分歧的方法可生成更优校准的概率图谱,并在这些模糊病例中展现更强鲁棒性。该基准揭示了体积精度作为局部手术效用替代指标的局限性,为推动术前决策的不确定性感知概率模型提供了理论依据。
English
Surgical resection remains the only potentially curative treatment for pancreatic ductal adenocarcinoma (PDAC), and eligibility depends on accurate assessment of vascular invasion (VI), i.e., tumor extension into adjacent critical vessels. Despite its importance for preoperative staging and surgical planning, computational VI assessment remains underexplored. Two major challenges are the lack of public datasets and the diagnostic ambiguity at the tumor-vessel interface, which leads to substantial inter-rater variability even among expert radiologists. To address these limitations, we introduce the CURVAS-PDACVI Dataset and Challenge, an open benchmark for uncertainty-aware AI in PDAC staging based on a densely annotated dataset with five independent expert annotations per scan. We also propose a multi-metric evaluation framework that extends beyond spatial overlap to include probabilistic calibration and VI assessment. Evaluation of six state-of-the-art methods shows that strong global volumetric overlap does not necessarily translate into reliable performance at clinically critical tumor-vessel interfaces. In particular, methods optimized for binary segmentation perform competitively on average overlap metrics, but often degrade in high-complexity cases with low expert consensus, either collapsing in volume or overextending at uncertain boundaries. In contrast, methods that model inter-rater disagreement produce better calibrated probabilistic maps and show greater robustness in these ambiguous cases. The benchmark highlights the limitations of volumetric accuracy as a proxy for localized surgical utility, motivating uncertainty-aware probabilistic models for preoperative decision-making.