ChatPaper.aiChatPaper

迷失于折:交叉验证并非不确定性估计的深度集成

Lost in the Folds: When Cross-Validation Is Not a Deep Ensemble for Uncertainty Estimation

May 18, 2026
作者: Kirscher Tristan, Bujotzek Markus, Kirchhoff Yannick, Rokuss Maximilian, Isensee Fabian, Kahl Kim-Celine, Kovacs Balint, Maier-Hein Klaus
cs.AI

摘要

集成不一致性被广泛用作医学图像分割中认知不确定性的代理指标。实践中,许多研究通过K折交叉验证(CV)形成集成,却将其称为“深度集成”(DE)。由于CV成员在不同数据子集上训练,其不一致性混合了种子驱动变异性与数据暴露效应,这会改变不确定性应被解读的方式。我们审计了近期分割不确定性研究,发现术语与实现不匹配的情况普遍存在。随后,在三个多评分者分割数据集(涵盖三种模态)上,我们比较了标准5折CV集成与5成员DE(固定训练集、不同随机种子)在完全相同配置下的表现,评估了校准、失败检测、模糊性建模及分布偏移下的鲁棒性。DE在保持分割精度的同时改进了校准与失败检测,而CV集成在研究数据集上有时与评分者间变异性相关性更强。因此,应依据研究问题选择集成构建方式:面向可靠性场景(如选择性转诊/失败检测)选用DE,而CV集成可作为模糊性的代理指标。我们提供了轻量级nnU-Net修改,使得在默认流程中即可进行DE训练。
English
Ensemble disagreement is widely used as a proxy for epistemic uncertainty in medical image segmentation. In practice, many studies form ensembles via K-fold cross-validation (CV), yet refer to them as ``deep ensembles'' (DE). Because CV members are trained on different data subsets, their disagreement mixes seed-driven variability with data-exposure effects, which can change how uncertainty should be interpreted. We audit recent segmentation uncertainty studies and find that terminology--implementation mismatches are common. We then compare a standard 5-fold CV ensemble to a 5-member DE (fixed training set, different random seeds) under otherwise identical configurations on three multi-rater segmentation datasets spanning three modalities. We evaluate uncertainty for calibration, failure detection, ambiguity modeling, and robustness under distribution shift. DE match segmentation accuracy while improving calibration and failure detection, whereas CV ensembles sometimes correlate more strongly with inter-rater variability on the studied datasets. Thus, ensemble construction should be chosen to match the research question: DE for reliability-oriented use (e.g., selective referral/failure detection) and CV ensembles as a proxy for ambiguity. We provide a lightweight nnU-Net modification enabling DE training within the default pipeline.