無監督通用影像分割
Unsupervised Universal Image Segmentation
December 28, 2023
作者: Dantong Niu, Xudong Wang, Xinyang Han, Long Lian, Roei Herzig, Trevor Darrell
cs.AI
摘要
已經提出了幾種無監督圖像分割方法,無需密集手動標註的分割遮罩;目前的模型分別處理語義分割(例如,STEGO)或類別不可知實例分割(例如,CutLER),但不包括兩者(即,全景分割)。我們提出了一個無監督通用分割模型(U2Seg),能夠使用新穎的統一框架執行各種圖像分割任務 -- 包括實例、語義和全景分割。U2Seg通過利用自監督模型生成這些分割任務的虛擬語義標籤,然後進行聚類;每個聚類代表像素的不同語義和/或實例成員資格。然後,我們對這些虛擬語義標籤進行自我訓練,相對於針對每個任務量身定制的專門方法,取得了顯著的性能提升:在COCO上,無監督實例分割中相對於CutLER的+2.6 AP^{box}提升,無監督語義分割中相對於STEGO的+7.0 PixelAcc增加。此外,我們的方法為未曾探索的無監督全景分割設立了新的基準。U2Seg也是一個強大的預訓練模型,用於少樣本分割,在低數據情況下訓練時,例如僅使用1%的COCO標籤時,超越CutLER +5.0 AP^{mask}。我們希望我們簡單而有效的方法能激發更多關於無監督通用圖像分割的研究。
English
Several unsupervised image segmentation approaches have been proposed which
eliminate the need for dense manually-annotated segmentation masks; current
models separately handle either semantic segmentation (e.g., STEGO) or
class-agnostic instance segmentation (e.g., CutLER), but not both (i.e.,
panoptic segmentation). We propose an Unsupervised Universal Segmentation model
(U2Seg) adept at performing various image segmentation tasks -- instance,
semantic and panoptic -- using a novel unified framework. U2Seg generates
pseudo semantic labels for these segmentation tasks via leveraging
self-supervised models followed by clustering; each cluster represents
different semantic and/or instance membership of pixels. We then self-train the
model on these pseudo semantic labels, yielding substantial performance gains
over specialized methods tailored to each task: a +2.6 AP^{box} boost
vs. CutLER in unsupervised instance segmentation on COCO and a +7.0 PixelAcc
increase (vs. STEGO) in unsupervised semantic segmentation on COCOStuff.
Moreover, our method sets up a new baseline for unsupervised panoptic
segmentation, which has not been previously explored. U2Seg is also a strong
pretrained model for few-shot segmentation, surpassing CutLER by +5.0
AP^{mask} when trained on a low-data regime, e.g., only 1% COCO
labels. We hope our simple yet effective method can inspire more research on
unsupervised universal image segmentation.