透過開放詞彙部分分割實現更高密度

摘要

目標檢測已從有限的類別擴展到開放詞彙。展望未來，一個完整的智能視覺系統需要理解更精細的物體描述和物體部分。在本文中，我們提出了一種具有預測開放詞彙物體及其部分分割能力的檢測器。這種能力來自兩個設計。首先，我們訓練檢測器在部分級別、物體級別和圖像級別數據的聯合上，以建立語言和圖像之間的多粒度對齊。其次，我們通過與基本物體的密集語義對應，將新物體解析為其部分。這兩種設計使檢測器能夠從各種數據來源和基礎模型中獲益良多。在開放詞彙部分分割實驗中，我們的方法在PartImageNet的跨數據集泛化中將基準提高了3.3sim7.3 mAP，並在Pascal Part的跨類別泛化中將基準提高了7.3個新的AP_{50}。最後，我們訓練了一個檢測器，它能夠泛化到各種部分分割數據集，同時實現比特定數據集訓練更好的性能。

English

Object detection has been expanded from a limited number of categories to open vocabulary. Moving forward, a complete intelligent vision system requires understanding more fine-grained object descriptions, object parts. In this paper, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation. This ability comes from two designs. First, we train the detector on the joint of part-level, object-level and image-level data to build the multi-granularity alignment between language and image. Second, we parse the novel object into its parts by its dense semantic correspondence with the base object. These two designs enable the detector to largely benefit from various data sources and foundation models. In open-vocabulary part segmentation experiments, our method outperforms the baseline by 3.3sim7.3 mAP in cross-dataset generalization on PartImageNet, and improves the baseline by 7.3 novel AP_{50} in cross-category generalization on Pascal Part. Finally, we train a detector that generalizes to a wide range of part segmentation datasets while achieving better performance than dataset-specific training.

透過開放詞彙部分分割實現更高密度

Going Denser with Open-Vocabulary Part Segmentation

摘要

Support