TinySAM：効率的なSegment Anything Modelの限界を押し広げる

要旨

最近、Segment Anything Model（SAM）は強力なセグメンテーション能力を示し、コンピュータビジョン分野で大きな注目を集めています。多くの後続研究が、事前学習済みのSAMを基に様々なアプリケーションを開発し、下流の視覚タスクで印象的な性能を達成しています。しかし、SAMは重いアーキテクチャで構成されており、大規模な計算能力を必要とするため、計算リソースが制約されたエッジデバイスでのさらなる応用が妨げられています。この問題に対処するため、本論文では、強力なゼロショット性能を維持しながら、小さなSegment Anything Model（TinySAM）を取得するためのフレームワークを提案します。まず、オンラインハードプロンプトサンプリング戦略を用いた全段階知識蒸留法を提案し、軽量な学生モデルを蒸留します。また、プロンプト可能なセグメンテーションタスクに適応したポストトレーニング量子化を適用し、計算コストをさらに削減します。さらに、階層的な「すべてをセグメント化する」戦略を提案し、性能の低下をほとんど伴わずに「すべてを推論する」処理を2倍加速します。これらの提案手法により、TinySAMは計算量を桁違いに削減し、効率的なSegment Anythingタスクの限界を押し広げます。様々なゼロショット転移タスクでの広範な実験により、TinySAMが他の手法に対して著しく優れた性能を示すことが実証されています。事前学習済みモデルとコードは、https://github.com/xinghaochen/TinySAM および https://gitee.com/mindspore/models/tree/master/research/cv/TinySAM で公開されます。

English

Recently segment anything model (SAM) has shown powerful segmentation capability and has drawn great attention in computer vision fields. Massive following works have developed various applications based on the pretrained SAM and achieved impressive performance on downstream vision tasks. However, SAM consists of heavy architectures and requires massive computational capacity, which hinders the further application of SAM on computation constrained edge devices. To this end, in this paper we propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance. We first propose a full-stage knowledge distillation method with online hard prompt sampling strategy to distill a lightweight student model. We also adapt the post-training quantization to the promptable segmentation task and further reduce the computational cost. Moreover, a hierarchical segmenting everything strategy is proposed to accelerate the everything inference by 2times with almost no performance degradation. With all these proposed methods, our TinySAM leads to orders of magnitude computational reduction and pushes the envelope for efficient segment anything task. Extensive experiments on various zero-shot transfer tasks demonstrate the significantly advantageous performance of our TinySAM against counterpart methods. Pre-trained models and codes will be available at https://github.com/xinghaochen/TinySAM and https://gitee.com/mindspore/models/tree/master/research/cv/TinySAM.

TinySAM：効率的なSegment Anything Modelの限界を押し広げる

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

要旨

Support