Semantic-SAM: 任意の粒度でセグメンテーションと認識を実現

要旨

本論文では、任意の粒度で物体をセグメント化し認識することを可能にする汎用画像セグメンテーションモデル「Semantic-SAM」を紹介する。本モデルは、セマンティック認識と粒度の豊富さという2つの重要な利点を提供する。セマンティック認識を実現するため、3つの粒度にわたる複数のデータセットを統合し、物体とパーツの分離された分類を導入する。これにより、モデルが豊富な意味情報を捉えることが可能となる。多粒度対応のため、トレーニング中にマルチチョイス学習スキームを提案し、各クリックが複数のグラウンドトゥルースマスクに対応する複数レベルのマスクを生成できるようにする。特に、本研究はSA-1B、汎用、およびパーツセグメンテーションデータセットを共同でトレーニングする初めての試みである。実験結果と可視化により、本モデルがセマンティック認識と粒度の豊富さを成功裏に達成していることが示された。さらに、SA-1Bトレーニングをパノプティックセグメンテーションやパーツセグメンテーションなどの他のセグメンテーションタスクと組み合わせることで、性能向上が得られる。さらなる探求と評価のため、コードとデモを提供する予定である。

English

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Our model offers two key advantages: semantic-awareness and granularity-abundance. To achieve semantic-awareness, we consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts. This allows our model to capture rich semantic information. For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels that correspond to multiple ground-truth masks. Notably, this work represents the first attempt to jointly train a model on SA-1B, generic, and part segmentation datasets. Experimental results and visualizations demonstrate that our model successfully achieves semantic-awareness and granularity-abundance. Furthermore, combining SA-1B training with other segmentation tasks, such as panoptic and part segmentation, leads to performance improvements. We will provide code and a demo for further exploration and evaluation.

Semantic-SAM: 任意の粒度でセグメンテーションと認識を実現

Semantic-SAM: Segment and Recognize Anything at Any Granularity

要旨

Support