ChatPaper.aiChatPaper

gen2seg:生成模型實現可泛化的實例分割

gen2seg: Generative Models Enable Generalizable Instance Segmentation

May 21, 2025
作者: Om Khangaonkar, Hamed Pirsiavash
cs.AI

摘要

通過對從擾動輸入中合成連貫圖像進行預訓練,生成模型本質上學會了理解物體邊界和場景構圖。我們如何將這些生成表示重新用於通用感知組織?我們使用實例著色損失,專門針對一組狹窄的物體類型(室內家具和汽車),對Stable Diffusion和MAE(編碼器+解碼器)進行微調,以實現類別無關的實例分割。令人驚訝的是,我們的模型展現出強大的零樣本泛化能力,能夠準確分割在微調中未見過的物體類型和風格(在許多情況下,MAE的ImageNet-1K預訓練也未見過)。我們表現最佳的模型在評估未見過的物體類型和風格時,接近於高度監督的SAM,並且在分割精細結構和模糊邊界時表現更優。相比之下,現有的可提示分割架構或判別式預訓練模型無法泛化。這表明生成模型學習了一種跨類別和領域的內在分組機制,即使沒有互聯網規模的預訓練。代碼、預訓練模型和演示可在我們的網站上獲取。
English
By pretraining to synthesize coherent images from perturbed inputs, generative models inherently learn to understand object boundaries and scene compositions. How can we repurpose these generative representations for general-purpose perceptual organization? We finetune Stable Diffusion and MAE (encoder+decoder) for category-agnostic instance segmentation using our instance coloring loss exclusively on a narrow set of object types (indoor furnishings and cars). Surprisingly, our models exhibit strong zero-shot generalization, accurately segmenting objects of types and styles unseen in finetuning (and in many cases, MAE's ImageNet-1K pretraining too). Our best-performing models closely approach the heavily supervised SAM when evaluated on unseen object types and styles, and outperform it when segmenting fine structures and ambiguous boundaries. In contrast, existing promptable segmentation architectures or discriminatively pretrained models fail to generalize. This suggests that generative models learn an inherent grouping mechanism that transfers across categories and domains, even without internet-scale pretraining. Code, pretrained models, and demos are available on our website.

Summary

AI-Generated Summary

PDF12May 23, 2025