Semantic-SAM: 모든 세분화 수준에서의 분할 및 인식

초록

본 논문에서는 어떠한 세분화 수준에서도 모든 것을 분할하고 인식할 수 있는 범용 이미지 분할 모델인 Semantic-SAM을 소개한다. 우리의 모델은 두 가지 주요 장점, 즉 의미 인식(semantic-awareness)과 세분화 풍부성(granularity-abundance)을 제공한다. 의미 인식을 달성하기 위해, 우리는 세 가지 세분화 수준에 걸친 여러 데이터셋을 통합하고 객체와 부분에 대한 분리된 분류(decoupled classification)를 도입하였다. 이를 통해 모델이 풍부한 의미 정보를 포착할 수 있게 되었다. 다중 세분화 기능을 위해, 우리는 훈련 중에 다중 선택 학습(multi-choice learning) 방식을 제안하여 각 클릭이 여러 수준의 마스크를 생성할 수 있도록 하였다. 이 마스크들은 여러 실제 마스크(ground-truth masks)에 대응한다. 특히, 이 작업은 SA-1B, 일반 분할(generic segmentation), 그리고 부분 분할(part segmentation) 데이터셋에 대해 모델을 공동으로 훈련하는 첫 번째 시도이다. 실험 결과와 시각화는 우리의 모델이 성공적으로 의미 인식과 세분화 풍부성을 달성했음을 보여준다. 더 나아가, SA-1B 훈련을 팬옵틱 분할(panoptic segmentation) 및 부분 분할과 같은 다른 분할 작업과 결합하면 성능 향상이 이루어진다. 우리는 추가 탐구와 평가를 위해 코드와 데모를 제공할 예정이다.

English

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Our model offers two key advantages: semantic-awareness and granularity-abundance. To achieve semantic-awareness, we consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts. This allows our model to capture rich semantic information. For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels that correspond to multiple ground-truth masks. Notably, this work represents the first attempt to jointly train a model on SA-1B, generic, and part segmentation datasets. Experimental results and visualizations demonstrate that our model successfully achieves semantic-awareness and granularity-abundance. Furthermore, combining SA-1B training with other segmentation tasks, such as panoptic and part segmentation, leads to performance improvements. We will provide code and a demo for further exploration and evaluation.

Semantic-SAM: 모든 세분화 수준에서의 분할 및 인식

Semantic-SAM: Segment and Recognize Anything at Any Granularity

초록

Support