빠른 세그먼트 애니싱

초록

최근 제안된 Segment Anything Model(SAM)은 많은 컴퓨터 비전 작업에 큰 영향을 미쳤습니다. 이 모델은 이미지 분할, 이미지 캡션 생성, 이미지 편집과 같은 많은 고수준 작업의 기초 단계로 자리 잡고 있습니다. 그러나 이 모델의 방대한 계산 비용으로 인해 산업 현장에서의 광범위한 적용이 어려운 실정입니다. 이러한 계산 비용은 주로 고해상도 입력에서의 Transformer 아키텍처에서 비롯됩니다. 본 논문에서는 이 기본 작업에 대해 비슷한 성능을 유지하면서 속도를 높이는 대안 방법을 제안합니다. 작업을 세그먼트 생성 및 프롬프트로 재구성함으로써, 일반적인 CNN 검출기에 인스턴스 분할 브랜치를 추가하여도 이 작업을 잘 수행할 수 있음을 발견했습니다. 구체적으로, 이 작업을 잘 연구된 인스턴스 분할 작업으로 변환하고, SAM 저자가 공개한 SA-1B 데이터셋의 1/50만을 사용하여 기존의 인스턴스 분할 방법을 직접 학습시켰습니다. 우리의 방법을 통해 SAM 방법과 비슷한 성능을 유지하면서 50배 더 빠른 실행 속도를 달성했습니다. 이를 입증하기 위해 충분한 실험 결과를 제시하였습니다. 코드와 데모는 https://github.com/CASIA-IVA-Lab/FastSAM에서 공개될 예정입니다.

English

The recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks. It is becoming a foundation step for many high-level tasks, like image segmentation, image caption, and image editing. However, its huge computation costs prevent it from wider applications in industry scenarios. The computation mainly comes from the Transformer architecture at high-resolution inputs. In this paper, we propose a speed-up alternative method for this fundamental task with comparable performance. By reformulating the task as segments-generation and prompting, we find that a regular CNN detector with an instance segmentation branch can also accomplish this task well. Specifically, we convert this task to the well-studied instance segmentation task and directly train the existing instance segmentation method using only 1/50 of the SA-1B dataset published by SAM authors. With our method, we achieve a comparable performance with the SAM method at 50 times higher run-time speed. We give sufficient experimental results to demonstrate its effectiveness. The codes and demos will be released at https://github.com/CASIA-IVA-Lab/FastSAM.

빠른 세그먼트 애니싱

Fast Segment Anything

초록

Support