EfficientViT-SAM：無性能損失的加速區段任務模型

摘要

我們提出了EfficientViT-SAM，這是一個新的加速區段任務模型系列。我們保留了SAM的輕量級提示編碼器和遮罩解碼器，同時將龐大的影像編碼器替換為EfficientViT。在訓練方面，我們從SAM-ViT-H影像編碼器向EfficientViT進行知識蒸餾。隨後，我們在SA-1B數據集上進行端到端的訓練。由於EfficientViT的效率和容量，EfficientViT-SAM在A100 GPU上的TensorRT速度提升達48.9倍，而不會犧牲性能。我們的程式碼和預訓練模型已在https://github.com/mit-han-lab/efficientvit 釋出。

English

We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.

EfficientViT-SAM：無性能損失的加速區段任務模型

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

摘要

Support