C-DiffDet+: 高忠実度物体検出のためのグローバルシーンコンテキストと生成的ノイズ除去の融合

要旨

車両損傷評価のような困難な視覚領域における細粒度の物体検出は、人間の専門家にとっても信頼性を持って解決することが難しい課題である。DiffusionDetは条件付きノイズ除去拡散を通じて最先端の技術を進展させたが、その性能は文脈依存のシナリオにおける局所的特徴の条件付けに制限されている。本研究では、この根本的な制限に対処するため、クロスアテンションメカニズムを活用してグローバルなシーンコンテキストと局所的な提案特徴を直接統合するContext-Aware Fusion（CAF）を導入する。グローバルコンテキストは、包括的な環境情報を捕捉する別個の専用エンコーダを使用して生成され、各物体提案がシーンレベルの理解に注意を向けることを可能にする。本フレームワークは、各物体提案が包括的な環境情報に注意を向けることを可能にすることで、生成的検出パラダイムを大幅に強化する。実験結果は、CarDDベンチマークにおいて最先端のモデルを上回る改善を示し、細粒度領域における文脈認識物体検出の新しい性能基準を確立する。

English

Fine-grained object detection in challenging visual domains, such as vehicle damage assessment, presents a formidable challenge even for human experts to resolve reliably. While DiffusionDet has advanced the state-of-the-art through conditional denoising diffusion, its performance remains limited by local feature conditioning in context-dependent scenarios. We address this fundamental limitation by introducing Context-Aware Fusion (CAF), which leverages cross-attention mechanisms to integrate global scene context with local proposal features directly. The global context is generated using a separate dedicated encoder that captures comprehensive environmental information, enabling each object proposal to attend to scene-level understanding. Our framework significantly enhances the generative detection paradigm by enabling each object proposal to attend to comprehensive environmental information. Experimental results demonstrate an improvement over state-of-the-art models on the CarDD benchmark, establishing new performance benchmarks for context-aware object detection in fine-grained domains

C-DiffDet+: 高忠実度物体検出のためのグローバルシーンコンテキストと生成的ノイズ除去の融合

C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Object Detection

要旨

Support