INT:針對任務通用的可提示分割進行特定實例的負推斥。
INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation
January 30, 2025
作者: Jian Hu, Zixu Cheng, Shaogang Gong
cs.AI
摘要
通用任務提示式影像分割旨在透過僅利用一個通用任務提示,在單一任務描述下實現對多樣樣本的分割。目前的方法利用視覺語言模型(VLMs)的泛化能力,從這些通用任務提示中推斷出特定實例的提示,以引導分割過程。然而,當VLMs難以對某些影像實例進行泛化時,預測特定實例的提示效果不佳。為解決這個問題,我們引入了針對通用任務提示式分割的特定實例負採樣(INT)。INT的關鍵思想是在生成特定實例提示時,自適應地減少無關(負面)先前知識的影響,同時增加對負採樣選擇的最合理先前知識的使用,以優化特定實例提示的生成。具體而言,INT包括兩個組件:(1)特定實例提示生成,逐步過濾提示生成中的不正確信息;(2)語義遮罩生成,確保每個影像實例的分割與特定實例提示的語義正確匹配。INT在六個數據集上進行驗證,包括偽裝物體和醫學影像,展示了其有效性、韌性和可擴展性。
English
Task-generic promptable image segmentation aims to achieve segmentation of
diverse samples under a single task description by utilizing only one
task-generic prompt. Current methods leverage the generalization capabilities
of Vision-Language Models (VLMs) to infer instance-specific prompts from these
task-generic prompts in order to guide the segmentation process. However, when
VLMs struggle to generalise to some image instances, predicting
instance-specific prompts becomes poor. To solve this problem, we introduce
Instance-specific Negative Mining for Task-Generic
Promptable Segmentation (INT). The key idea of INT is to adaptively
reduce the influence of irrelevant (negative) prior knowledge whilst to
increase the use the most plausible prior knowledge, selected by negative
mining with higher contrast, in order to optimise instance-specific prompts
generation. Specifically, INT consists of two components: (1) instance-specific
prompt generation, which progressively fliters out incorrect information in
prompt generation; (2) semantic mask generation, which ensures each image
instance segmentation matches correctly the semantics of the instance-specific
prompts. INT is validated on six datasets, including camouflaged objects and
medical images, demonstrating its effectiveness, robustness and scalability.Summary
AI-Generated Summary