INT：針對任務通用的可提示分割進行特定實例的負推斥。

摘要

通用任務提示式影像分割旨在透過僅利用一個通用任務提示，在單一任務描述下實現對多樣樣本的分割。目前的方法利用視覺語言模型（VLMs）的泛化能力，從這些通用任務提示中推斷出特定實例的提示，以引導分割過程。然而，當VLMs難以對某些影像實例進行泛化時，預測特定實例的提示效果不佳。為解決這個問題，我們引入了針對通用任務提示式分割的特定實例負採樣（INT）。INT的關鍵思想是在生成特定實例提示時，自適應地減少無關（負面）先前知識的影響，同時增加對負採樣選擇的最合理先前知識的使用，以優化特定實例提示的生成。具體而言，INT包括兩個組件：（1）特定實例提示生成，逐步過濾提示生成中的不正確信息；（2）語義遮罩生成，確保每個影像實例的分割與特定實例提示的語義正確匹配。INT在六個數據集上進行驗證，包括偽裝物體和醫學影像，展示了其有效性、韌性和可擴展性。

English

Task-generic promptable image segmentation aims to achieve segmentation of diverse samples under a single task description by utilizing only one task-generic prompt. Current methods leverage the generalization capabilities of Vision-Language Models (VLMs) to infer instance-specific prompts from these task-generic prompts in order to guide the segmentation process. However, when VLMs struggle to generalise to some image instances, predicting instance-specific prompts becomes poor. To solve this problem, we introduce Instance-specific Negative Mining for Task-Generic Promptable Segmentation (INT). The key idea of INT is to adaptively reduce the influence of irrelevant (negative) prior knowledge whilst to increase the use the most plausible prior knowledge, selected by negative mining with higher contrast, in order to optimise instance-specific prompts generation. Specifically, INT consists of two components: (1) instance-specific prompt generation, which progressively fliters out incorrect information in prompt generation; (2) semantic mask generation, which ensures each image instance segmentation matches correctly the semantics of the instance-specific prompts. INT is validated on six datasets, including camouflaged objects and medical images, demonstrating its effectiveness, robustness and scalability.

INT：針對任務通用的可提示分割進行特定實例的負推斥。

INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation

摘要

Support