ChatPaper.aiChatPaper

MB-ORES:一種用於遙感視覺定位的多分支物體推理器

MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing

March 31, 2025
作者: Karim Radouane, Hanane Azzag, Mustapha lebbah
cs.AI

摘要

我們提出了一個統一框架,將物件檢測(OD)與視覺定位(VG)整合應用於遙感(RS)影像。為了支援傳統的OD任務並為VG任務建立直觀的先驗知識,我們利用參考表達數據對一個開放集物件檢測器進行微調,將其框架化為部分監督的OD任務。在第一階段,我們為每張影像構建圖形表示,包含物件查詢、類別嵌入和候選位置。接著,我們的任務感知架構處理此圖形以執行VG任務。該模型包含:(i) 一個多分支網絡,整合空間、視覺和類別特徵以生成任務感知的候選框,以及(ii) 一個物件推理網絡,為候選框分配概率,並通過軟選擇機制進行最終的參考物件定位。我們的模型在OPT-RSVG和DIOR-RSVG數據集上展現了卓越的性能,相較於現有最先進方法實現了顯著提升,同時保留了經典的OD能力。程式碼將公開於我們的儲存庫:https://github.com/rd20karim/MB-ORES。
English
We propose a unified framework that integrates object detection (OD) and visual grounding (VG) for remote sensing (RS) imagery. To support conventional OD and establish an intuitive prior for VG task, we fine-tune an open-set object detector using referring expression data, framing it as a partially supervised OD task. In the first stage, we construct a graph representation of each image, comprising object queries, class embeddings, and proposal locations. Then, our task-aware architecture processes this graph to perform the VG task. The model consists of: (i) a multi-branch network that integrates spatial, visual, and categorical features to generate task-aware proposals, and (ii) an object reasoning network that assigns probabilities across proposals, followed by a soft selection mechanism for final referring object localization. Our model demonstrates superior performance on the OPT-RSVG and DIOR-RSVG datasets, achieving significant improvements over state-of-the-art methods while retaining classical OD capabilities. The code will be available in our repository: https://github.com/rd20karim/MB-ORES.

Summary

AI-Generated Summary

PDF22April 2, 2025