ChatPaper.aiChatPaper

C-DiffDet+:融合全局场景上下文与生成式去噪的高保真目标检测

C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Object Detection

August 30, 2025
作者: Abdellah Zakaria Sellam, Ilyes Benaissa, Salah Eddine Bekhouche, Abdenour Hadid, Vito Renó, Cosimo Distante
cs.AI

摘要

在诸如车辆损伤评估等具有挑战性的视觉领域中,细粒度目标检测即使对于人类专家而言,也是一项难以可靠解决的艰巨任务。尽管DiffusionDet通过条件去噪扩散技术推动了该领域的前沿发展,但其在依赖上下文情境下的表现仍受限于局部特征条件。针对这一根本性局限,我们引入了上下文感知融合(Context-Aware Fusion, CAF)机制,该机制利用交叉注意力机制,将全局场景上下文与局部候选特征直接整合。全局上下文由一个独立的专用编码器生成,该编码器捕获全面的环境信息,使得每个目标候选能够关注场景层面的理解。我们的框架通过使每个目标候选能够关注到全面的环境信息,显著增强了生成式检测范式的性能。实验结果表明,在CarDD基准测试上,我们的方法超越了现有最先进模型,为细粒度领域中的上下文感知目标检测设立了新的性能标杆。
English
Fine-grained object detection in challenging visual domains, such as vehicle damage assessment, presents a formidable challenge even for human experts to resolve reliably. While DiffusionDet has advanced the state-of-the-art through conditional denoising diffusion, its performance remains limited by local feature conditioning in context-dependent scenarios. We address this fundamental limitation by introducing Context-Aware Fusion (CAF), which leverages cross-attention mechanisms to integrate global scene context with local proposal features directly. The global context is generated using a separate dedicated encoder that captures comprehensive environmental information, enabling each object proposal to attend to scene-level understanding. Our framework significantly enhances the generative detection paradigm by enabling each object proposal to attend to comprehensive environmental information. Experimental results demonstrate an improvement over state-of-the-art models on the CarDD benchmark, establishing new performance benchmarks for context-aware object detection in fine-grained domains
PDF11September 3, 2025