高品質なセグメンテーションを任意に実行

要旨

最近登場したSegment Anything Model（SAM）は、セグメンテーションモデルのスケールアップにおいて大きな飛躍を遂げ、強力なゼロショット能力と柔軟なプロンプティングを可能にしました。11億のマスクで学習されたにもかかわらず、SAMのマスク予測品質は多くの場合、特に複雑な構造を持つオブジェクトを扱う際に不十分です。本論文では、HQ-SAMを提案し、SAMに正確に任意のオブジェクトをセグメント化する能力を付与しながら、SAMの元のプロンプト可能な設計、効率性、ゼロショット汎用性を維持します。我々の慎重な設計は、SAMの事前学習済みモデル重みを再利用し保存しつつ、最小限の追加パラメータと計算のみを導入します。我々は、学習可能な高品質出力トークンを設計し、これをSAMのマスクデコーダに注入して、高品質マスクの予測を担当させます。マスクデコーダの特徴量にのみ適用するのではなく、まずそれらを初期および最終のViT特徴量と融合させ、マスクの詳細を改善します。導入した学習可能なパラメータを訓練するために、複数のソースから44,000の細粒度マスクのデータセットを構成しました。HQ-SAMは、導入した44,000マスクのデータセットでのみ訓練され、8GPUでわずか4時間しかかかりません。我々は、異なる下流タスクにわたる9つの多様なセグメンテーションデータセットにおいてHQ-SAMの有効性を示し、そのうち7つはゼロショット転移プロトコルで評価されます。我々のコードとモデルはhttps://github.com/SysCV/SAM-HQで公開されます。

English

The recent Segment Anything Model (SAM) represents a big leap in scaling up segmentation models, allowing for powerful zero-shot capabilities and flexible prompting. Despite being trained with 1.1 billion masks, SAM's mask prediction quality falls short in many cases, particularly when dealing with objects that have intricate structures. We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM's original promptable design, efficiency, and zero-shot generalizability. Our careful design reuses and preserves the pre-trained model weights of SAM, while only introducing minimal additional parameters and computation. We design a learnable High-Quality Output Token, which is injected into SAM's mask decoder and is responsible for predicting the high-quality mask. Instead of only applying it on mask-decoder features, we first fuse them with early and final ViT features for improved mask details. To train our introduced learnable parameters, we compose a dataset of 44K fine-grained masks from several sources. HQ-SAM is only trained on the introduced detaset of 44k masks, which takes only 4 hours on 8 GPUs. We show the efficacy of HQ-SAM in a suite of 9 diverse segmentation datasets across different downstream tasks, where 7 out of them are evaluated in a zero-shot transfer protocol. Our code and models will be released at https://github.com/SysCV/SAM-HQ.

高品質なセグメンテーションを任意に実行

Segment Anything in High Quality

要旨

Support