使用一次學習個性化分割模型
Personalize Segment Anything Model with One Shot
May 4, 2023
作者: Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Hao Dong, Peng Gao, Hongsheng Li
cs.AI
摘要
受大數據預訓練驅動,Segment Anything Model(SAM)已被證明為一個強大且可促進的框架,徹底改變了分割模型。儘管具有通用性,尚未深入探討為特定視覺概念定制SAM而無需人力提示,例如,在不同圖像中自動分割您的寵物狗。本文提出了一種無需訓練的個性化方法,稱為PerSAM,用於SAM。給定僅一張帶有參考遮罩的圖像,PerSAM首先通過位置先驗來定位目標概念,並通過三種技術在其他圖像或視頻中進行分割:目標引導注意力、目標語義提示和級聯後處理。通過這種方式,我們有效地使SAM適應私人使用而無需任何訓練。為了進一步減輕遮罩的模糊性,我們提出了一種高效的一次性微調變體,PerSAM-F。凍結整個SAM,我們引入了兩個可學習的權重用於多尺度遮罩,僅在10秒內訓練2個參數以提高性能。為了展示我們的效力,我們構建了一個新的分割數據集,PerSeg,用於個性化評估,並在具有競爭性表現的視頻對象分割上測試我們的方法。此外,我們的方法還可以增強DreamBooth,以個性化穩定擴散用於文本到圖像生成,從而消除背景干擾以獲得更好的目標外觀學習。代碼已在https://github.com/ZrrSkywalker/Personalize-SAM 上發布。
English
Driven by large-data pre-training, Segment Anything Model (SAM) has been
demonstrated as a powerful and promptable framework, revolutionizing the
segmentation models. Despite the generality, customizing SAM for specific
visual concepts without man-powered prompting is under explored, e.g.,
automatically segmenting your pet dog in different images. In this paper, we
propose a training-free Personalization approach for SAM, termed as PerSAM.
Given only a single image with a reference mask, PerSAM first localizes the
target concept by a location prior, and segments it within other images or
videos via three techniques: target-guided attention, target-semantic
prompting, and cascaded post-refinement. In this way, we effectively adapt SAM
for private use without any training. To further alleviate the mask ambiguity,
we present an efficient one-shot fine-tuning variant, PerSAM-F. Freezing the
entire SAM, we introduce two learnable weights for multi-scale masks, only
training 2 parameters within 10 seconds for improved performance. To
demonstrate our efficacy, we construct a new segmentation dataset, PerSeg, for
personalized evaluation, and test our methods on video object segmentation with
competitive performance. Besides, our approach can also enhance DreamBooth to
personalize Stable Diffusion for text-to-image generation, which discards the
background disturbance for better target appearance learning. Code is released
at https://github.com/ZrrSkywalker/Personalize-SAM