EfficientViT-SAM: 성능 손실 없이 가속화된 Segment Anything 모델

초록

본 논문에서는 새로운 가속화된 세그먼트 애니싱(segment anything) 모델 패밀리인 EfficientViT-SAM을 소개한다. 우리는 SAM의 경량 프롬프트 인코더와 마스크 디코더를 유지하면서, 무거운 이미지 인코더를 EfficientViT로 대체하였다. 학습 과정에서는 먼저 SAM-ViT-H 이미지 인코더로부터 EfficientViT로의 지식 증류(knowledge distillation)를 수행한 후, SA-1B 데이터셋에 대해 종단간(end-to-end) 학습을 진행하였다. EfficientViT의 효율성과 용량 덕분에, EfficientViT-SAM은 성능 저하 없이 A100 GPU에서 SAM-ViT-H 대비 48.9배의 TensorRT 속도 향상을 달성하였다. 본 연구의 코드와 사전 학습된 모델은 https://github.com/mit-han-lab/efficientvit에서 공개하였다.

English

We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.

EfficientViT-SAM: 성능 손실 없이 가속화된 Segment Anything 모델

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

초록

Support