ChatPaper.aiChatPaper

SegEarth-OV3:探索SAM 3在遥感图像开放词汇语义分割中的应用

SegEarth-OV3: Exploring SAM 3 for Open-Vocabulary Semantic Segmentation in Remote Sensing Images

December 9, 2025
作者: Kaiyu Li, Shengqi Zhang, Yupeng Deng, Zhi Wang, Deyu Meng, Xiangyong Cao
cs.AI

摘要

当前大多数无需训练的开集词汇语义分割方法均基于CLIP架构。尽管这些方法已取得一定进展,但通常在精确定位方面面临挑战,或需要复杂流程整合独立模块,尤其在包含大量密集小目标的遥感场景中更为突出。近期提出的Segment Anything Model 3将分割与识别功能统一于可提示框架中。本文首次探索了将SAM 3直接应用于遥感开集词汇语义分割任务的可行性。首先,我们设计了掩码融合策略,整合SAM 3语义分割头与Transformer解码器(实例头)的输出,充分发挥双头优势以提升地物覆盖效果。其次,利用存在性头部的存在分数过滤场景中不存在的类别,有效缓解地理空间场景中因词汇量庞大和块级处理导致的误检问题。我们在多个遥感数据集上的实验表明,这种简易适配方案取得了令人鼓舞的性能,证明了SAM 3在遥感开集词汇语义分割领域的潜力。代码已开源:https://github.com/earth-insights/SegEarth-OV-3。
English
Most existing methods for training-free Open-Vocabulary Semantic Segmentation (OVSS) are based on CLIP. While these approaches have made progress, they often face challenges in precise localization or require complex pipelines to combine separate modules, especially in remote sensing scenarios where numerous dense and small targets are present. Recently, Segment Anything Model 3 (SAM 3) was proposed, unifying segmentation and recognition in a promptable framework. In this paper, we present a preliminary exploration of applying SAM 3 to the remote sensing OVSS task without any training. First, we implement a mask fusion strategy that combines the outputs from SAM 3's semantic segmentation head and the Transformer decoder (instance head). This allows us to leverage the strengths of both heads for better land coverage. Second, we utilize the presence score from the presence head to filter out categories that do not exist in the scene, reducing false positives caused by the vast vocabulary sizes and patch-level processing in geospatial scenes. We evaluate our method on extensive remote sensing datasets. Experiments show that this simple adaptation achieves promising performance, demonstrating the potential of SAM 3 for remote sensing OVSS. Our code is released at https://github.com/earth-insights/SegEarth-OV-3.
PDF22February 7, 2026