C-DiffDet+: 고품질 객체 탐지를 위한 생성적 노이즈 제거와 전역 장면 컨텍스트 융합

초록

차량 손상 평가와 같은 도전적인 시각적 영역에서의 세밀한 객체 탐지는 인간 전문가에게도 신뢰성 있게 해결하기 어려운 과제입니다. DiffusionDet가 조건부 노이즈 제거 확산을 통해 최첨단 기술을 발전시켰음에도 불구하고, 문맥 의존적 시나리오에서의 성능은 지역적 특징 조건화에 의해 제한되고 있습니다. 우리는 이러한 근본적인 한계를 해결하기 위해 교차 주의 메커니즘을 활용하여 전역적 장면 문맥과 지역적 제안 특징을 직접 통합하는 Context-Aware Fusion(CAF)을 제안합니다. 전역적 문맥은 포괄적인 환경 정보를 캡처하는 별도의 전용 인코더를 사용하여 생성되며, 이를 통해 각 객체 제안이 장면 수준의 이해에 주의를 기울일 수 있게 합니다. 우리의 프레임워크는 각 객체 제안이 포괄적인 환경 정보에 주의를 기울일 수 있게 함으로써 생성적 탐지 패러다임을 크게 향상시킵니다. 실험 결과는 CarDD 벤치마크에서 최첨단 모델을 능가하는 성능 향상을 보여주며, 세밀한 영역에서의 문맥 인식 객체 탐지를 위한 새로운 성능 벤치마크를 확립합니다.

English

Fine-grained object detection in challenging visual domains, such as vehicle damage assessment, presents a formidable challenge even for human experts to resolve reliably. While DiffusionDet has advanced the state-of-the-art through conditional denoising diffusion, its performance remains limited by local feature conditioning in context-dependent scenarios. We address this fundamental limitation by introducing Context-Aware Fusion (CAF), which leverages cross-attention mechanisms to integrate global scene context with local proposal features directly. The global context is generated using a separate dedicated encoder that captures comprehensive environmental information, enabling each object proposal to attend to scene-level understanding. Our framework significantly enhances the generative detection paradigm by enabling each object proposal to attend to comprehensive environmental information. Experimental results demonstrate an improvement over state-of-the-art models on the CarDD benchmark, establishing new performance benchmarks for context-aware object detection in fine-grained domains

C-DiffDet+: 고품질 객체 탐지를 위한 생성적 노이즈 제거와 전역 장면 컨텍스트 융합

C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Object Detection

초록

Support