gen2seg:生成式模型赋能通用实例分割
gen2seg: Generative Models Enable Generalizable Instance Segmentation
May 21, 2025
作者: Om Khangaonkar, Hamed Pirsiavash
cs.AI
摘要
通过对扰动输入进行图像合成的预训练,生成模型本质上学会了理解物体边界和场景构成。我们如何将这些生成式表征重新用于通用感知组织任务?我们使用实例着色损失,在有限的对象类别(室内家具和汽车)上对Stable Diffusion和MAE(编码器+解码器)进行微调,以实现类别无关的实例分割。令人惊讶的是,我们的模型展现出强大的零样本泛化能力,能够准确分割在微调过程中未见过的对象类型和风格(在许多情况下,甚至超越了MAE在ImageNet-1K上的预训练)。在评估未见过的对象类型和风格时,我们表现最佳的模型与高度监督的SAM模型接近,并在分割精细结构和模糊边界时表现更优。相比之下,现有的可提示分割架构或判别式预训练模型则无法实现类似的泛化。这表明,生成模型学习到了一种跨类别和领域的内在分组机制,即使没有互联网规模的预训练。代码、预训练模型和演示可在我们的网站上获取。
English
By pretraining to synthesize coherent images from perturbed inputs,
generative models inherently learn to understand object boundaries and scene
compositions. How can we repurpose these generative representations for
general-purpose perceptual organization? We finetune Stable Diffusion and MAE
(encoder+decoder) for category-agnostic instance segmentation using our
instance coloring loss exclusively on a narrow set of object types (indoor
furnishings and cars). Surprisingly, our models exhibit strong zero-shot
generalization, accurately segmenting objects of types and styles unseen in
finetuning (and in many cases, MAE's ImageNet-1K pretraining too). Our
best-performing models closely approach the heavily supervised SAM when
evaluated on unseen object types and styles, and outperform it when segmenting
fine structures and ambiguous boundaries. In contrast, existing promptable
segmentation architectures or discriminatively pretrained models fail to
generalize. This suggests that generative models learn an inherent grouping
mechanism that transfers across categories and domains, even without
internet-scale pretraining. Code, pretrained models, and demos are available on
our website.Summary
AI-Generated Summary