ChatPaper.aiChatPaper

LPOSS:基於圖塊與像素的標籤傳播實現開放詞彙語義分割

LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation

March 25, 2025
作者: Vladan Stojnić, Yannis Kalantidis, Jiří Matas, Giorgos Tolias
cs.AI

摘要

我們提出了一種無需訓練的開放詞彙語義分割方法,該方法利用視覺與語言模型(VLMs)。我們的方法通過標籤傳播來增強VLMs的初始逐塊預測,該過程結合了塊與塊之間的關係來聯合優化預測結果。由於VLMs主要針對跨模態對齊進行優化,而非模態內相似性,因此我們採用了一個視覺模型(VM),該模型被觀察到能更好地捕捉這些關係。我們通過在像素級別應用標籤傳播作為精煉步驟,來解決基於塊的編碼器固有的分辨率限制,從而顯著提高了類別邊界附近的分割精度。我們的方法名為LPOSS+,它在整個圖像上進行推理,避免了基於窗口的處理,從而捕捉到全圖像的上下文交互。LPOSS+在多樣化的數據集上,在無需訓練的方法中達到了最先進的性能。代碼:https://github.com/vladan-stojnic/LPOSS
English
We propose a training-free method for open-vocabulary semantic segmentation using Vision-and-Language Models (VLMs). Our approach enhances the initial per-patch predictions of VLMs through label propagation, which jointly optimizes predictions by incorporating patch-to-patch relationships. Since VLMs are primarily optimized for cross-modal alignment and not for intra-modal similarity, we use a Vision Model (VM) that is observed to better capture these relationships. We address resolution limitations inherent to patch-based encoders by applying label propagation at the pixel level as a refinement step, significantly improving segmentation accuracy near class boundaries. Our method, called LPOSS+, performs inference over the entire image, avoiding window-based processing and thereby capturing contextual interactions across the full image. LPOSS+ achieves state-of-the-art performance among training-free methods, across a diverse set of datasets. Code: https://github.com/vladan-stojnic/LPOSS

Summary

AI-Generated Summary

PDF12March 26, 2025