ChatPaper.aiChatPaper

Segment Anything模型家族中的SAM2至SAM3断层:为何基于提示的专家能力在概念驱动图像分割中失效

The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation

December 4, 2025
作者: Ranjan Sapkota, Konstantinos I. Roumeliotis, Manoj Karkee
cs.AI

摘要

本文深入探讨了最新两代分割基础模型SAM2与SAM3之间的根本性断层。我们阐释了为何SAM2基于提示的分割专业知识无法迁移至SAM3的多模态概念驱动范式。SAM2通过空间提示(点、框、掩码)实现纯几何与时序分割,而SAM3则引入了统一视觉-语言架构,具备开放词汇推理、语义定位、对比对齐及基于范例的概念理解能力。本文通过五大核心维度展开分析:(1)提示分割与概念分割的范式断层,对比SAM2的空间提示语义与SAM3的多模态融合及文本条件掩码生成;(2)架构差异,详述SAM2纯视觉-时序设计与SAM3融合视觉语言编码器、几何与范例编码器、融合模块、DETR风格解码器、对象查询及专家混合 ambiguity 处理机制;(3)数据集与标注差异,对比SAM2的SA-V视频掩码与SAM3的多模态概念标注语料;(4)训练与超参数区别,揭示SAM2优化经验为何不适用于SAM3;(5)评估指标与失效模式,勾勒从几何IoU指标向语义化开放词汇评估的转变。这些分析共同确立了SAM3作为新一代分割基础模型的地位,并为新兴的概念驱动分割时代指明了发展方向。
English
This paper investigates the fundamental discontinuity between the latest two Segment Anything Models: SAM2 and SAM3. We explain why the expertise in prompt-based segmentation of SAM2 does not transfer to the multimodal concept-driven paradigm of SAM3. SAM2 operates through spatial prompts points, boxes, and masks yielding purely geometric and temporal segmentation. In contrast, SAM3 introduces a unified vision-language architecture capable of open-vocabulary reasoning, semantic grounding, contrastive alignment, and exemplar-based concept understanding. We structure this analysis through five core components: (1) a Conceptual Break Between Prompt-Based and Concept-Based Segmentation, contrasting spatial prompt semantics of SAM2 with multimodal fusion and text-conditioned mask generation of SAM3; (2) Architectural Divergence, detailing pure vision-temporal design of SAM2 versus integration of vision-language encoders, geometry and exemplar encoders, fusion modules, DETR-style decoders, object queries, and ambiguity-handling via Mixture-of-Experts in SAM3; (3) Dataset and Annotation Differences, contrasting SA-V video masks with multimodal concept-annotated corpora of SAM3; (4) Training and Hyperparameter Distinctions, showing why SAM2 optimization knowledge does not apply to SAM3; and (5) Evaluation, Metrics, and Failure Modes, outlining the transition from geometric IoU metrics to semantic, open-vocabulary evaluation. Together, these analyses establish SAM3 as a new class of segmentation foundation model and chart future directions for the emerging concept-driven segmentation era.
PDF02December 10, 2025