无提示通用区域提议网络

摘要

潜在目标识别对于各类计算机视觉应用中的目标识别与分析至关重要。现有方法通常依赖范例图像、预定义类别或文本描述来定位潜在目标，但这种对图像和文本提示的依赖往往限制了灵活性，制约了在实际场景中的适应性。本文提出了一种新颖的无提示通用区域建议网络（PF-RPN），无需外部提示即可识别潜在目标。首先，稀疏图像感知适配器（SIA）模块通过可随视觉特征动态更新的可学习查询嵌入，对潜在目标进行初步定位。接着，级联自提示（CSP）模块利用自提示的可学习嵌入，以级联方式自主聚合信息丰富的视觉特征，从而识别剩余潜在目标。最后，中心度引导查询选择（CG-QS）模块通过中心度评分网络辅助筛选高质量查询嵌入。本方法仅需少量数据（如MS COCO数据集的5%）即可完成优化，并能直接应用于水下目标检测、工业缺陷检测、遥感图像目标检测等多个领域识别潜在目标，且无需微调。在19个数据集上的实验结果验证了本方法的有效性。代码已开源：https://github.com/tangqh03/PF-RPN。

English

Identifying potential objects is critical for object recognition and analysis across various computer vision applications. Existing methods typically localize potential objects by relying on exemplar images, predefined categories, or textual descriptions. However, their reliance on image and text prompts often limits flexibility, restricting adaptability in real-world scenarios. In this paper, we introduce a novel Prompt-Free Universal Region Proposal Network (PF-RPN), which identifies potential objects without relying on external prompts. First, the Sparse Image-Aware Adapter (SIA) module performs initial localization of potential objects using a learnable query embedding dynamically updated with visual features. Next, the Cascade Self-Prompt (CSP) module identifies the remaining potential objects by leveraging the self-prompted learnable embedding, autonomously aggregating informative visual features in a cascading manner. Finally, the Centerness-Guided Query Selection (CG-QS) module facilitates the selection of high-quality query embeddings using a centerness scoring network. Our method can be optimized with limited data (e.g., 5% of MS COCO data) and applied directly to various object detection application domains for identifying potential objects without fine-tuning, such as underwater object detection, industrial defect detection, and remote sensing image object detection. Experimental results across 19 datasets validate the effectiveness of our method. Code is available at https://github.com/tangqh03/PF-RPN.