EfficientViT-SAM：无性能损失的加速段注意力模型

摘要

我们提出了EfficientViT-SAM，这是一种新型的加速片段任意模型系列。我们保留了SAM的轻量级提示编码器和掩码解码器，同时用EfficientViT替换了沉重的图像编码器。在训练阶段，我们首先从SAM-ViT-H图像编码器向EfficientViT进行知识蒸馏。随后，我们在SA-1B数据集上进行端到端训练。由于EfficientViT的高效性和容量，EfficientViT-SAM在A100 GPU上的TensorRT加速性能提升达到48.9倍，而不会牺牲性能。我们的代码和预训练模型已发布在https://github.com/mit-han-lab/efficientvit。

English

We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.

EfficientViT-SAM：无性能损失的加速段注意力模型

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

摘要

Support