場景中心的無監督全景分割

摘要

無監督全景分割旨在無需依賴人工標註數據進行訓練的情況下，將圖像劃分為語義上有意義的區域和獨立的物體實例。與先前關於無監督全景場景理解的研究不同，我們消除了對以物體為中心的訓練數據的需求，從而實現了對複雜場景的無監督理解。為此，我們提出了首個直接在場景中心圖像上進行訓練的無監督全景方法。具體而言，我們提出了一種方法，通過結合視覺表徵、深度和運動線索，在複雜的場景中心數據上獲取高分辨率的全景偽標籤。利用偽標籤訓練和全景自訓練策略，我們開發了一種新穎的方法，能夠準確預測複雜場景的全景分割，而無需任何人工標註。我們的方法顯著提升了全景質量，例如，在Cityscapes數據集上的無監督全景分割任務中，以9.4%的PQ（全景質量）分數超越了最新的技術水平。

English

Unsupervised panoptic segmentation aims to partition an image into semantically meaningful regions and distinct object instances without training on manually annotated data. In contrast to prior work on unsupervised panoptic scene understanding, we eliminate the need for object-centric training data, enabling the unsupervised understanding of complex scenes. To that end, we present the first unsupervised panoptic method that directly trains on scene-centric imagery. In particular, we propose an approach to obtain high-resolution panoptic pseudo labels on complex scene-centric data, combining visual representations, depth, and motion cues. Utilizing both pseudo-label training and a panoptic self-training strategy yields a novel approach that accurately predicts panoptic segmentation of complex scenes without requiring any human annotations. Our approach significantly improves panoptic quality, e.g., surpassing the recent state of the art in unsupervised panoptic segmentation on Cityscapes by 9.4% points in PQ.

場景中心的無監督全景分割

Scene-Centric Unsupervised Panoptic Segmentation

摘要

Support