ChatPaper.aiChatPaper

UnSAMv2:自监督学习实现任意粒度下的通用分割

UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity

November 17, 2025
作者: Junwei Yu, Trevor Darrell, XuDong Wang
cs.AI

摘要

Segment Anything Model(SAM)系列已成为广泛采用的视觉基础模型,但其分割粒度控制能力仍存在局限。用户常需通过手动添加提示或从预生成掩码中选择来细化结果,以实现理想细节水平。这一过程存在模糊性——相同提示可能对应多个合理掩码,且全粒度密集标注成本高昂,使得监督式解决方案难以实现。为突破此限制,我们提出UnSAMv2模型,无需人工标注即可实现任意粒度分割。该模型延展了UnSAM的分治策略,通过发掘海量掩码-粒度配对关系,引入新型粒度控制嵌入模块,实现对分割尺度的精准连续调控。值得注意的是,仅使用6千张无标注图像和0.02%的附加参数量,UnSAMv2就显著增强了SAM-2模型,在交互式、全图像及视频分割任务中实现任意粒度分割。在超过11个基准测试中,UnSAMv2将NoC₉₀(5.69→4.75)、1-IoU(58.0→73.1)和AR₁₀₀₀(49.6→68.3)等指标显著提升,证明结合粒度感知自监督学习方法,少量无标注数据即可释放视觉基础模型的潜力。
English
The Segment Anything Model (SAM) family has become a widely adopted vision foundation model, but its ability to control segmentation granularity remains limited. Users often need to refine results manually - by adding more prompts or selecting from pre-generated masks - to achieve the desired level of detail. This process can be ambiguous, as the same prompt may correspond to several plausible masks, and collecting dense annotations across all granularities is prohibitively expensive, making supervised solutions infeasible. To address this limitation, we introduce UnSAMv2, which enables segment anything at any granularity without human annotations. UnSAMv2 extends the divide-and-conquer strategy of UnSAM by discovering abundant mask-granularity pairs and introducing a novel granularity control embedding that enables precise, continuous control over segmentation scale. Remarkably, with only 6K unlabeled images and 0.02% additional parameters, UnSAMv2 substantially enhances SAM-2, achieving segment anything at any granularity across interactive, whole-image, and video segmentation tasks. Evaluated on over 11 benchmarks, UnSAMv2 improves NoC_{90} (5.69 rightarrow 4.75), 1-IoU (58.0 rightarrow 73.1), and AR_{1000} (49.6 rightarrow 68.3), showing that small amounts of unlabeled data with a granularity-aware self-supervised learning method can unlock the potential of vision foundation models.
PDF102December 1, 2025