ChatPaper.aiChatPaper

通过真实场景先验信息实现野外自然图像抠图

Towards Natural Image Matting in the Wild via Real-Scenario Prior

October 9, 2024
作者: Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Qianru Sun, Yang Tang, Bo Li, Pan Zhou
cs.AI

摘要

最近的研究方法尝试将强大的交互式分割模型,如SAM,应用于交互式抠图,并基于合成抠图数据集对模型进行微调。然而,在合成数据上训练的模型无法推广到复杂和遮挡场景。我们通过提出基于COCO数据集的新抠图数据集来解决这一挑战,即COCO抠图。具体而言,我们的COCO抠图构建包括配件融合和mask-to-matte,从COCO中选择真实世界的复杂图像,并将语义分割mask转换为抠图标签。构建的COCO抠图包括38251个复杂自然场景中的人类实例级alpha抠图的广泛集合。此外,现有基于SAM的抠图方法从冻结的SAM中提取中间特征和mask,仅通过端到端抠图损失训练轻量级抠图解码器,未充分利用预训练SAM的潜力。因此,我们提出了SEMat,重新设计了网络架构和训练目标。在网络架构方面,提出的特征对齐变换器学习提取细粒度的边缘和透明度特征。提出的抠图对齐解码器旨在分割抠图特定对象,并将粗糙mask转换为高精度抠图。在训练目标方面,提出的正则化和trimap损失旨在保留来自预训练模型的先验知识,并推动从mask解码器提取的抠图logits包含基于trimap的语义信息。在七个不同数据集上进行的大量实验表明我们方法的卓越性能,证明了其在交互式自然图像抠图中的有效性。我们在https://github.com/XiaRho/SEMat 开源我们的代码、模型和数据集。
English
Recent approaches attempt to adapt powerful interactive segmentation models, such as SAM, to interactive matting and fine-tune the models based on synthetic matting datasets. However, models trained on synthetic data fail to generalize to complex and occlusion scenes. We address this challenge by proposing a new matting dataset based on the COCO dataset, namely COCO-Matting. Specifically, the construction of our COCO-Matting includes accessory fusion and mask-to-matte, which selects real-world complex images from COCO and converts semantic segmentation masks to matting labels. The built COCO-Matting comprises an extensive collection of 38,251 human instance-level alpha mattes in complex natural scenarios. Furthermore, existing SAM-based matting methods extract intermediate features and masks from a frozen SAM and only train a lightweight matting decoder by end-to-end matting losses, which do not fully exploit the potential of the pre-trained SAM. Thus, we propose SEMat which revamps the network architecture and training objectives. For network architecture, the proposed feature-aligned transformer learns to extract fine-grained edge and transparency features. The proposed matte-aligned decoder aims to segment matting-specific objects and convert coarse masks into high-precision mattes. For training objectives, the proposed regularization and trimap loss aim to retain the prior from the pre-trained model and push the matting logits extracted from the mask decoder to contain trimap-based semantic information. Extensive experiments across seven diverse datasets demonstrate the superior performance of our method, proving its efficacy in interactive natural image matting. We open-source our code, models, and dataset at https://github.com/XiaRho/SEMat.

Summary

AI-Generated Summary

PDF32November 16, 2024