ChatPaper.aiChatPaper

GroupEnsemble:基于DETR目标检测模型的高效不确定性估计方法

GroupEnsemble: Efficient Uncertainty Estimation for DETR-based Object Detection

March 2, 2026
作者: Yutong Yang, Katarina Popović, Julian Wiederer, Markus Braun, Vasileios Belagiannis, Bin Yang
cs.AI

摘要

Detection Transformer(DETR)及其变体在目标检测(自动驾驶系统的关键任务)上展现出强大性能。然而这些模型存在一个关键局限:其置信度分数仅反映语义不确定性,未能捕捉同等重要的空间不确定性,导致对检测可靠性的评估不够全面。另一方面,深度集成方法虽能通过提供高质量的空间不确定性估计来解决此问题,但其巨大的内存消耗使其难以应用于实际场景。而更经济的替代方案——蒙特卡洛(MC)丢弃法,由于需要在推理阶段进行多次前向传播来估计不确定性,存在高延迟问题。 为克服这些局限,我们提出了GroupEnsemble——一种面向类DETR模型的高效不确定性估计方法。该方法通过在推理阶段向变换器解码器输入额外多样化的目标查询组,同时预测多个独立检测集。每个查询组经由共享解码器独立变换后,对同一输入预测完整的检测结果。通过应用注意力掩码机制阻止组间查询交互,确保各组独立完成检测,从而实现可靠的集成不确定性估计。借助解码器固有的并行处理能力,GroupEnsemble可在单次前向传播中高效完成不确定性估计,无需序列重复。我们在自动驾驶场景(Cityscapes数据集)和日常场景(COCO数据集)中验证了本方法,结果表明结合MC丢弃法与GroupEnsemble的混合策略在多项指标上以更低成本超越了深度集成方法。代码已开源:https://github.com/yutongy98/GroupEnsemble。
English
Detection Transformer (DETR) and its variants show strong performance on object detection, a key task for autonomous systems. However, a critical limitation of these models is that their confidence scores only reflect semantic uncertainty, failing to capture the equally important spatial uncertainty. This results in an incomplete assessment of the detection reliability. On the other hand, Deep Ensembles can tackle this by providing high-quality spatial uncertainty estimates. However, their immense memory consumption makes them impractical for real-world applications. A cheaper alternative, Monte Carlo (MC) Dropout, suffers from high latency due to the need of multiple forward passes during inference to estimate uncertainty. To address these limitations, we introduce GroupEnsemble, an efficient and effective uncertainty estimation method for DETR-like models. GroupEnsemble simultaneously predicts multiple individual detection sets by feeding additional diverse groups of object queries to the transformer decoder during inference. Each query group is transformed by the shared decoder in isolation and predicts a complete detection set for the same input. An attention mask is applied to the decoder to prevent inter-group query interactions, ensuring each group detects independently to achieve reliable ensemble-based uncertainty estimation. By leveraging the decoder's inherent parallelism, GroupEnsemble efficiently estimates uncertainty in a single forward pass without sequential repetition. We validated our method under autonomous driving scenes and common daily scenes using the Cityscapes and COCO datasets, respectively. The results show that a hybrid approach combining MC-Dropout and GroupEnsemble outperforms Deep Ensembles on several metrics at a fraction of the cost. The code is available at https://github.com/yutongy98/GroupEnsemble.
PDF12March 6, 2026