膵管腺癌血管浸潤の評価：PDACVIベンチマーク

要旨

膵管腺癌（PDAC）に対する外科的切除は現在も唯一の根治的可能性を有する治療法であり、その適応は血管侵襲（VI）、すなわち隣接する重要血管への腫瘍進展の正確な評価に依存する。術前病期分類や手術計画においてその重要性にもかかわらず、VIの計算機による評価は未だ十分に研究されていない。主な課題は、公開データセットの不足と、腫瘍-血管界面における診断の曖昧さであり、これは専門放射線科医の間でも評価者間変動を引き起こす。これらの限界に対処するため、我々はCURVAS-PDACVIデータセット及びチャレンジを提案する。これは、各スキャンに対して5名の専門家による独立したアノテーションを有する高密度アノテーションデータセットに基づく、PDAC病期分類のための不確実性を考慮したAIの公開ベンチマークである。さらに、空間的重複を超えて確率的較正とVI評価を含む、多角的評価フレームワークを提案する。6つの最先端手法の評価により、強力な大域的体積的重複が必ずしも臨床的に重要な腫瘍-血管界面での信頼性の高い性能に変換されないことが示された。特に、二値セグメンテーションのために最適化された手法は平均的重複指標では競争力があるが、専門家の合意度が低い高複雑症例では、体積が過小または過大になり、不確実な境界で性能が劣化する傾向があった。対照的に、評価者間の不一致をモデル化する手法は、より良く較正された確率マップを生成し、これらの曖昧な症例においてより優れた頑健性を示した。本ベンチマークは、局所的な手術有用性の代理指標としての体積精度の限界を浮き彫りにし、術前意思決定のための不確実性を考慮した確率モデルの必要性を示唆する。

English

Surgical resection remains the only potentially curative treatment for pancreatic ductal adenocarcinoma (PDAC), and eligibility depends on accurate assessment of vascular invasion (VI), i.e., tumor extension into adjacent critical vessels. Despite its importance for preoperative staging and surgical planning, computational VI assessment remains underexplored. Two major challenges are the lack of public datasets and the diagnostic ambiguity at the tumor-vessel interface, which leads to substantial inter-rater variability even among expert radiologists. To address these limitations, we introduce the CURVAS-PDACVI Dataset and Challenge, an open benchmark for uncertainty-aware AI in PDAC staging based on a densely annotated dataset with five independent expert annotations per scan. We also propose a multi-metric evaluation framework that extends beyond spatial overlap to include probabilistic calibration and VI assessment. Evaluation of six state-of-the-art methods shows that strong global volumetric overlap does not necessarily translate into reliable performance at clinically critical tumor-vessel interfaces. In particular, methods optimized for binary segmentation perform competitively on average overlap metrics, but often degrade in high-complexity cases with low expert consensus, either collapsing in volume or overextending at uncertain boundaries. In contrast, methods that model inter-rater disagreement produce better calibrated probabilistic maps and show greater robustness in these ambiguous cases. The benchmark highlights the limitations of volumetric accuracy as a proxy for localized surgical utility, motivating uncertainty-aware probabilistic models for preoperative decision-making.

膵管腺癌血管浸潤の評価：PDACVIベンチマーク

Assessing Pancreatic Ductal Adenocarcinoma Vascular Invasion: the PDACVI Benchmark

要旨

Support