ChatPaper.aiChatPaper

使用一次性个性化分段模型

Personalize Segment Anything Model with One Shot

May 4, 2023
作者: Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Hao Dong, Peng Gao, Hongsheng Li
cs.AI

摘要

受大数据预训练驱动,分割任意模型(SAM)已被证明是一个强大且可推广的框架,彻底改变了分割模型。尽管具有普适性,但为特定视觉概念定制SAM而无需人工提示的研究尚未深入,例如,在不同图像中自动分割您的宠物狗。在本文中,我们提出了一种针对SAM的无需训练的个性化方法,称为PerSAM。只需一张带有参考蒙版的单个图像,PerSAM首先通过位置先验定位目标概念,然后通过三种技术在其他图像或视频中对其进行分割:目标引导注意力、目标语义提示和级联后处理。通过这种方式,我们有效地使SAM适应了私人使用而无需任何训练。为了进一步减轻蒙版的模糊性,我们提出了一种高效的一次性微调变体,PerSAM-F。冻结整个SAM,我们引入了两个可学习的权重用于多尺度蒙版,仅在10秒内训练2个参数以提高性能。为了展示我们的有效性,我们构建了一个新的分割数据集PerSeg,用于个性化评估,并在具有竞争性能的视频对象分割上测试了我们的方法。此外,我们的方法还可以增强DreamBooth,以个性化稳定扩散用于文本到图像生成,从而消除背景干扰以获得更好的目标外观学习。代码已发布在https://github.com/ZrrSkywalker/Personalize-SAM。
English
Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models. Despite the generality, customizing SAM for specific visual concepts without man-powered prompting is under explored, e.g., automatically segmenting your pet dog in different images. In this paper, we propose a training-free Personalization approach for SAM, termed as PerSAM. Given only a single image with a reference mask, PerSAM first localizes the target concept by a location prior, and segments it within other images or videos via three techniques: target-guided attention, target-semantic prompting, and cascaded post-refinement. In this way, we effectively adapt SAM for private use without any training. To further alleviate the mask ambiguity, we present an efficient one-shot fine-tuning variant, PerSAM-F. Freezing the entire SAM, we introduce two learnable weights for multi-scale masks, only training 2 parameters within 10 seconds for improved performance. To demonstrate our efficacy, we construct a new segmentation dataset, PerSeg, for personalized evaluation, and test our methods on video object segmentation with competitive performance. Besides, our approach can also enhance DreamBooth to personalize Stable Diffusion for text-to-image generation, which discards the background disturbance for better target appearance learning. Code is released at https://github.com/ZrrSkywalker/Personalize-SAM
PDF91December 15, 2024