プロンプトフリー汎用領域提案ネットワーク

要旨

潜在的な物体の識別は、様々なコンピュータビジョンアプリケーションにおける物体認識と分析において極めて重要である。既存手法では、一般に例示画像、事前定義されたカテゴリ、またはテキスト記述に依存して潜在的な物体を位置特定する。しかし、画像やテキストのプロンプトへの依存性は柔軟性を制限し、実世界のシナリオにおける適応性を阻害することが多い。本論文では、外部プロンプトに依存せずに潜在的な物体を識別する、新しいプロンプトフリー汎用領域提案ネットワーク（PF-RPN）を提案する。まず、Sparse Image-Aware Adapter（SIA）モジュールが、視覚的特徴で動的に更新される学習可能なクエリ埋め込みを用いて潜在的な物体の初期位置特定を行う。次に、Cascade Self-Prompt（CSP）モジュールが、自己プロンプトによる学習可能な埋め込みを活用して残りの潜在的な物体を識別し、情報量の多い視覚的特徴をカスケード方式で自律的に集約する。最後に、Centerness-Guided Query Selection（CG-QS）モジュールが、中心性スコアリングネットワークを用いて高品質なクエリ埋め込みの選択を促進する。提案手法は限られたデータ（例：MS COCOデータの5%）で最適化でき、ファインチューニングなしで水中物体検出、工業欠陥検出、リモートセンシング画像物体検出など、様々な物体検出応用領域に直接適用可能である。19のデータセットにわたる実験結果は、本手法の有効性を実証している。コードはhttps://github.com/tangqh03/PF-RPNで公開されている。

English

Identifying potential objects is critical for object recognition and analysis across various computer vision applications. Existing methods typically localize potential objects by relying on exemplar images, predefined categories, or textual descriptions. However, their reliance on image and text prompts often limits flexibility, restricting adaptability in real-world scenarios. In this paper, we introduce a novel Prompt-Free Universal Region Proposal Network (PF-RPN), which identifies potential objects without relying on external prompts. First, the Sparse Image-Aware Adapter (SIA) module performs initial localization of potential objects using a learnable query embedding dynamically updated with visual features. Next, the Cascade Self-Prompt (CSP) module identifies the remaining potential objects by leveraging the self-prompted learnable embedding, autonomously aggregating informative visual features in a cascading manner. Finally, the Centerness-Guided Query Selection (CG-QS) module facilitates the selection of high-quality query embeddings using a centerness scoring network. Our method can be optimized with limited data (e.g., 5% of MS COCO data) and applied directly to various object detection application domains for identifying potential objects without fine-tuning, such as underwater object detection, industrial defect detection, and remote sensing image object detection. Experimental results across 19 datasets validate the effectiveness of our method. Code is available at https://github.com/tangqh03/PF-RPN.