抠图任意物体
Matting Anything
June 8, 2023
作者: Jiachen Li, Jitesh Jain, Humphrey Shi
cs.AI
摘要
本文提出了Matting Anything Model(MAM),這是一個高效且多功能的框架,用於估計圖像中任何實例的alpha遮罩,並提供靈活且互動式的視覺或語言提示指導。MAM相較於先前專門的圖像遮罩網絡具有幾個顯著優勢:(i) MAM能夠處理各種類型的圖像遮罩,包括語義、實例和參考圖像遮罩,僅使用單個模型;(ii) MAM利用Segment Anything Model(SAM)的特徵映射,並採用輕量級的Mask-to-Matte(M2M)模塊通過迭代細化來預測alpha遮罩,僅具有270萬可訓練參數;(iii) 通過整合SAM,MAM簡化了互動式圖像遮罩使用所需的用戶干預,從trimap到框、點或文本提示。我們在各種圖像遮罩基準測試中評估了MAM的性能,實驗結果表明,MAM在每個基準測試中在不同指標下實現了與最先進的專門圖像遮罩模型可比的性能。總的來說,MAM展現出卓越的泛化能力,能夠有效處理各種圖像遮罩任務,並使用更少的參數,使其成為統一圖像遮罩的實用解決方案。我們的代碼和模型在https://github.com/SHI-Labs/Matting-Anything 上開源。
English
In this paper, we propose the Matting Anything Model (MAM), an efficient and
versatile framework for estimating the alpha matte of any instance in an image
with flexible and interactive visual or linguistic user prompt guidance. MAM
offers several significant advantages over previous specialized image matting
networks: (i) MAM is capable of dealing with various types of image matting,
including semantic, instance, and referring image matting with only a single
model; (ii) MAM leverages the feature maps from the Segment Anything Model
(SAM) and adopts a lightweight Mask-to-Matte (M2M) module to predict the alpha
matte through iterative refinement, which has only 2.7 million trainable
parameters. (iii) By incorporating SAM, MAM simplifies the user intervention
required for the interactive use of image matting from the trimap to the box,
point, or text prompt. We evaluate the performance of MAM on various image
matting benchmarks, and the experimental results demonstrate that MAM achieves
comparable performance to the state-of-the-art specialized image matting models
under different metrics on each benchmark. Overall, MAM shows superior
generalization ability and can effectively handle various image matting tasks
with fewer parameters, making it a practical solution for unified image
matting. Our code and models are open-sourced at
https://github.com/SHI-Labs/Matting-Anything.