ChatPaper.aiChatPaper

LPOSS:基于图像块与像素的标签传播实现开放词汇语义分割

LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation

March 25, 2025
作者: Vladan Stojnić, Yannis Kalantidis, Jiří Matas, Giorgos Tolias
cs.AI

摘要

我们提出了一种无需训练的开放词汇语义分割方法,该方法利用视觉与语言模型(VLMs)。我们的方法通过标签传播增强VLMs初始的逐块预测,通过整合块间关系共同优化预测结果。鉴于VLMs主要针对跨模态对齐而非模态内相似性进行优化,我们采用了一个视觉模型(VM),该模型被观察到能更好地捕捉这些关系。针对基于块的编码器固有的分辨率限制,我们在像素级别应用标签传播作为细化步骤,显著提升了类别边界附近的分割精度。我们的方法名为LPOSS+,在整个图像上进行推理,避免了基于窗口的处理方式,从而捕捉到全图范围内的上下文交互。LPOSS+在多种数据集上实现了无需训练方法中的最先进性能。代码地址:https://github.com/vladan-stojnic/LPOSS
English
We propose a training-free method for open-vocabulary semantic segmentation using Vision-and-Language Models (VLMs). Our approach enhances the initial per-patch predictions of VLMs through label propagation, which jointly optimizes predictions by incorporating patch-to-patch relationships. Since VLMs are primarily optimized for cross-modal alignment and not for intra-modal similarity, we use a Vision Model (VM) that is observed to better capture these relationships. We address resolution limitations inherent to patch-based encoders by applying label propagation at the pixel level as a refinement step, significantly improving segmentation accuracy near class boundaries. Our method, called LPOSS+, performs inference over the entire image, avoiding window-based processing and thereby capturing contextual interactions across the full image. LPOSS+ achieves state-of-the-art performance among training-free methods, across a diverse set of datasets. Code: https://github.com/vladan-stojnic/LPOSS

Summary

AI-Generated Summary

PDF12March 26, 2025