通过开放词汇部分分割变得更加密集

摘要

目标检测已从有限类别扩展到开放词汇。展望未来，一个完整的智能视觉系统需要理解更精细的物体描述和物体部件。本文提出了一种具有预测开放词汇物体及其部分分割能力的检测器。这种能力来自两个设计。首先，我们训练检测器在部分级别、物体级别和图像级别数据的联合上，以建立语言和图像之间的多粒度对齐。其次，我们通过基础物体的密集语义对应将新颖物体解析为其部分。这两种设计使检测器能够充分受益于各种数据源和基础模型。在开放词汇部分分割实验中，我们的方法在PartImageNet的跨数据集泛化上比基准表现提高了3.3sim7.3 mAP，并在Pascal Part的跨类别泛化上将基准表现提高了7.3个新颖AP_{50}。最后，我们训练了一个检测器，能够泛化到各种部分分割数据集，同时实现比特定数据集训练更好的性能。

English

Object detection has been expanded from a limited number of categories to open vocabulary. Moving forward, a complete intelligent vision system requires understanding more fine-grained object descriptions, object parts. In this paper, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation. This ability comes from two designs. First, we train the detector on the joint of part-level, object-level and image-level data to build the multi-granularity alignment between language and image. Second, we parse the novel object into its parts by its dense semantic correspondence with the base object. These two designs enable the detector to largely benefit from various data sources and foundation models. In open-vocabulary part segmentation experiments, our method outperforms the baseline by 3.3sim7.3 mAP in cross-dataset generalization on PartImageNet, and improves the baseline by 7.3 novel AP_{50} in cross-category generalization on Pascal Part. Finally, we train a detector that generalizes to a wide range of part segmentation datasets while achieving better performance than dataset-specific training.

通过开放词汇部分分割变得更加密集

Going Denser with Open-Vocabulary Part Segmentation

摘要

Support