Prompt-Vrije Universele Regio-Voorstelnetsel

Samenvatting

Het identificeren van potentiële objecten is cruciaal voor objectherkenning en -analyse in diverse computervisie-toepassingen. Bestaande methoden lokaliseren potentiële objecten doorgaans door te vertrouwen op voorbeeldafbeeldingen, vooraf gedefinieerde categorieën of tekstuele beschrijvingen. Hun afhankelijkheid van beeld- en tekstprompts beperkt echter vaak de flexibiliteit en past zich moeilijk aan in realistische scenario's. In dit artikel introduceren we een nieuw Prompt-Free Universal Region Proposal Network (PF-RPN), dat potentiële objecten identificeert zonder afhankelijk te zijn van externe prompts. Ten eerste voert de Sparse Image-Aware Adapter (SIA)-module een initiële lokalisatie van potentiële objecten uit met behulp van een leerbare query-embedding die dynamisch wordt bijgewerkt met visuele kenmerken. Vervolgens identificeert de Cascade Self-Prompt (CSP)-module de overige potentiële objecten door gebruik te maken van de zelf-geprompte leerbare embedding, waarbij op autonome wijze informatieve visuele kenmerken worden geaggregeerd in een cascadestructuur. Totstand faciliteert de Centerness-Guided Query Selection (CG-QS)-module de selectie van hoogwaardige query-embeddings met behulp van een centerness-scoringnetwerk. Onze methode kan worden geoptimaliseerd met beperkte data (bijvoorbeeld 5% van MS COCO-data) en direct worden toegepast op diverse toepassingsdomeinen voor objectdetectie zonder fine-tuning, zoals onderwaterobjectdetectie, detectie van industriële defecten en objectdetectie in remote sensing-beelden. Experimentele resultaten op 19 datasets valideren de effectiviteit van onze methode. Code is beschikbaar op https://github.com/tangqh03/PF-RPN.

English

Identifying potential objects is critical for object recognition and analysis across various computer vision applications. Existing methods typically localize potential objects by relying on exemplar images, predefined categories, or textual descriptions. However, their reliance on image and text prompts often limits flexibility, restricting adaptability in real-world scenarios. In this paper, we introduce a novel Prompt-Free Universal Region Proposal Network (PF-RPN), which identifies potential objects without relying on external prompts. First, the Sparse Image-Aware Adapter (SIA) module performs initial localization of potential objects using a learnable query embedding dynamically updated with visual features. Next, the Cascade Self-Prompt (CSP) module identifies the remaining potential objects by leveraging the self-prompted learnable embedding, autonomously aggregating informative visual features in a cascading manner. Finally, the Centerness-Guided Query Selection (CG-QS) module facilitates the selection of high-quality query embeddings using a centerness scoring network. Our method can be optimized with limited data (e.g., 5% of MS COCO data) and applied directly to various object detection application domains for identifying potential objects without fine-tuning, such as underwater object detection, industrial defect detection, and remote sensing image object detection. Experimental results across 19 datasets validate the effectiveness of our method. Code is available at https://github.com/tangqh03/PF-RPN.