ChatPaper.aiChatPaper

INSID3:基于DINOv3的免训练上下文分割方法

INSID3: Training-Free In-Context Segmentation with DINOv3

March 30, 2026
作者: Claudia Cuttano, Gabriele Trivigno, Christoph Reich, Daniel Cremers, Carlo Masone, Stefan Roth
cs.AI

摘要

上下文分割(ICS)的目标是在给定一个标注视觉示例的情况下,对任意概念(如物体、部件或个性化实例)进行分割。现有方法依赖两种路径:(i) 微调视觉基础模型(VFMs),虽能提升域内效果但会损害泛化能力;(ii) 组合多个冻结的VFMs,虽能保持泛化性但会导致架构复杂化和固定的分割粒度。我们以极简视角重新审视ICS并提出:单个自监督骨干网络能否在无需任何监督或辅助模型的情况下,同时支持语义匹配与分割?我们发现,DINOv3生成的规模化稠密自监督特征具有强空间结构和语义对应性。据此我们提出INSID3——一种仅基于冻结DINOv3特征即可实现多粒度概念分割的无训练方法。INSID3在一次性语义分割、部件分割和个性化分割任务中均达到最先进水平,mIoU指标较前人工作提升7.5%,同时参数量减少3倍且无需掩码或类别级监督。代码已开源:https://github.com/visinf/INSID3。
English
In-context segmentation (ICS) aims to segment arbitrary concepts, e.g., objects, parts, or personalized instances, given one annotated visual examples. Existing work relies on (i) fine-tuning vision foundation models (VFMs), which improves in-domain results but harms generalization, or (ii) combines multiple frozen VFMs, which preserves generalization but yields architectural complexity and fixed segmentation granularities. We revisit ICS from a minimalist perspective and ask: Can a single self-supervised backbone support both semantic matching and segmentation, without any supervision or auxiliary models? We show that scaled-up dense self-supervised features from DINOv3 exhibit strong spatial structure and semantic correspondence. We introduce INSID3, a training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. INSID3 achieves state-of-the-art results across one-shot semantic, part, and personalized segmentation, outperforming previous work by +7.5 % mIoU, while using 3x fewer parameters and without any mask or category-level supervision. Code is available at https://github.com/visinf/INSID3 .
PDF01April 1, 2026