C-DiffDet+：融合全局场景上下文与生成式去噪的高保真目标检测

摘要

在诸如车辆损伤评估等具有挑战性的视觉领域中，细粒度目标检测即使对于人类专家而言，也是一项难以可靠解决的艰巨任务。尽管DiffusionDet通过条件去噪扩散技术推动了该领域的前沿发展，但其在依赖上下文情境下的表现仍受限于局部特征条件。针对这一根本性局限，我们引入了上下文感知融合（Context-Aware Fusion, CAF）机制，该机制利用交叉注意力机制，将全局场景上下文与局部候选特征直接整合。全局上下文由一个独立的专用编码器生成，该编码器捕获全面的环境信息，使得每个目标候选能够关注场景层面的理解。我们的框架通过使每个目标候选能够关注到全面的环境信息，显著增强了生成式检测范式的性能。实验结果表明，在CarDD基准测试上，我们的方法超越了现有最先进模型，为细粒度领域中的上下文感知目标检测设立了新的性能标杆。

English

Fine-grained object detection in challenging visual domains, such as vehicle damage assessment, presents a formidable challenge even for human experts to resolve reliably. While DiffusionDet has advanced the state-of-the-art through conditional denoising diffusion, its performance remains limited by local feature conditioning in context-dependent scenarios. We address this fundamental limitation by introducing Context-Aware Fusion (CAF), which leverages cross-attention mechanisms to integrate global scene context with local proposal features directly. The global context is generated using a separate dedicated encoder that captures comprehensive environmental information, enabling each object proposal to attend to scene-level understanding. Our framework significantly enhances the generative detection paradigm by enabling each object proposal to attend to comprehensive environmental information. Experimental results demonstrate an improvement over state-of-the-art models on the CarDD benchmark, establishing new performance benchmarks for context-aware object detection in fine-grained domains

C-DiffDet+：融合全局场景上下文与生成式去噪的高保真目标检测

C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Object Detection

摘要

Support