抠图任意物体
Matting Anything
June 8, 2023
作者: Jiachen Li, Jitesh Jain, Humphrey Shi
cs.AI
摘要
本文提出了Matting Anything Model(MAM),这是一个高效且多功能的框架,用于估计图像中任何实例的alpha抠图,可通过灵活和交互式的视觉或语言用户提示进行引导。MAM相比先前的专门图像抠图网络具有几个重要优势:(i)MAM能够处理各种类型的图像抠图,包括语义抠图、实例抠图和指代图像抠图,仅需一个模型;(ii)MAM利用了Segment Anything Model(SAM)的特征图,并采用轻量级的Mask-to-Matte(M2M)模块通过迭代细化来预测alpha抠图,仅有270万可训练参数;(iii)通过整合SAM,MAM简化了交互式使用图像抠图时用户介入的需求,从trimap到框、点或文本提示。我们在各种图像抠图基准上评估了MAM的性能,实验结果表明,MAM在每个基准上的不同指标下均达到了与最先进的专门图像抠图模型相媲美的性能。总体而言,MAM表现出卓越的泛化能力,能够有效处理各种图像抠图任务,且参数更少,是统一图像抠图的实用解决方案。我们的代码和模型已在https://github.com/SHI-Labs/Matting-Anything 开源。
English
In this paper, we propose the Matting Anything Model (MAM), an efficient and
versatile framework for estimating the alpha matte of any instance in an image
with flexible and interactive visual or linguistic user prompt guidance. MAM
offers several significant advantages over previous specialized image matting
networks: (i) MAM is capable of dealing with various types of image matting,
including semantic, instance, and referring image matting with only a single
model; (ii) MAM leverages the feature maps from the Segment Anything Model
(SAM) and adopts a lightweight Mask-to-Matte (M2M) module to predict the alpha
matte through iterative refinement, which has only 2.7 million trainable
parameters. (iii) By incorporating SAM, MAM simplifies the user intervention
required for the interactive use of image matting from the trimap to the box,
point, or text prompt. We evaluate the performance of MAM on various image
matting benchmarks, and the experimental results demonstrate that MAM achieves
comparable performance to the state-of-the-art specialized image matting models
under different metrics on each benchmark. Overall, MAM shows superior
generalization ability and can effectively handle various image matting tasks
with fewer parameters, making it a practical solution for unified image
matting. Our code and models are open-sourced at
https://github.com/SHI-Labs/Matting-Anything.