C-DiffDet+：融合全局場景上下文與生成式去噪技術的高保真物體檢測

摘要

在具有挑戰性的視覺領域中，如車輛損傷評估，細粒度物體檢測即使對於人類專家來說也是一項難以可靠解決的難題。儘管DiffusionDet通過條件去噪擴散技術推動了該領域的技術前沿，但其在依賴上下文的情境中，性能仍受限於局部特徵條件。我們針對這一根本性限制，引入了上下文感知融合（Context-Aware Fusion, CAF），該方法利用交叉注意力機制，直接將全局場景上下文與局部提案特徵相結合。全局上下文由一個獨立的專用編碼器生成，該編碼器捕捉全面的環境信息，使每個物體提案都能關注到場景層面的理解。我們的框架通過使每個物體提案都能關注到全面的環境信息，顯著增強了生成式檢測範式。實驗結果表明，在CarDD基準測試上，我們的模型相較於現有最先進模型有所提升，為細粒度領域中的上下文感知物體檢測樹立了新的性能標杆。

English

Fine-grained object detection in challenging visual domains, such as vehicle damage assessment, presents a formidable challenge even for human experts to resolve reliably. While DiffusionDet has advanced the state-of-the-art through conditional denoising diffusion, its performance remains limited by local feature conditioning in context-dependent scenarios. We address this fundamental limitation by introducing Context-Aware Fusion (CAF), which leverages cross-attention mechanisms to integrate global scene context with local proposal features directly. The global context is generated using a separate dedicated encoder that captures comprehensive environmental information, enabling each object proposal to attend to scene-level understanding. Our framework significantly enhances the generative detection paradigm by enabling each object proposal to attend to comprehensive environmental information. Experimental results demonstrate an improvement over state-of-the-art models on the CarDD benchmark, establishing new performance benchmarks for context-aware object detection in fine-grained domains

C-DiffDet+：融合全局場景上下文與生成式去噪技術的高保真物體檢測

C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Object Detection

摘要

Support