ChatPaper.aiChatPaper

UniRef++:在空間和時間空間中對每個參考物件進行分割。

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

December 25, 2023
作者: Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo
cs.AI

摘要

基於參考的物件分割任務,即指涉圖像分割(RIS)、少樣本圖像分割(FSS)、指涉視頻物件分割(RVOS)和視頻物件分割(VOS),旨在通過利用語言或標註遮罩作為參考來分割特定物件。儘管各自領域取得了顯著進展,但目前的方法是針對特定任務設計和發展的,走向不同,這阻礙了這些任務的多任務能力的激活。在這項工作中,我們結束了目前的碎片化狀況,提出了UniRef++,以統一四個基於參考的物件分割任務,並使用單一架構。我們方法的核心是提出的UniFusion模塊,用於通過多路融合處理不同任務,根據它們指定的參考。然後採用統一的Transformer架構來實現實例級別的分割。通過統一的設計,UniRef++可以在廣泛的基準上進行聯合訓練,並可以通過指定相應的參考在運行時靈活完成多個任務。我們在各種基準上評估我們的統一模型。廣泛的實驗結果表明,我們提出的UniRef++在RIS和RVOS上實現了最先進的性能,在FSS和VOS上與共享參數網絡競爭性地表現。此外,我們展示了提出的UniFusion模塊可以輕鬆地融入當前先進的基礎模型SAM,並通過參數高效的微調獲得令人滿意的結果。代碼和模型可在https://github.com/FoundationVision/UniRef找到。
English
The reference-based object segmentation tasks, namely referring image segmentation (RIS), few-shot image segmentation (FSS), referring video object segmentation (RVOS), and video object segmentation (VOS), aim to segment a specific object by utilizing either language or annotated masks as references. Despite significant progress in each respective field, current methods are task-specifically designed and developed in different directions, which hinders the activation of multi-task capabilities for these tasks. In this work, we end the current fragmented situation and propose UniRef++ to unify the four reference-based object segmentation tasks with a single architecture. At the heart of our approach is the proposed UniFusion module which performs multiway-fusion for handling different tasks with respect to their specified references. And a unified Transformer architecture is then adopted for achieving instance-level segmentation. With the unified designs, UniRef++ can be jointly trained on a broad range of benchmarks and can flexibly complete multiple tasks at run-time by specifying the corresponding references. We evaluate our unified models on various benchmarks. Extensive experimental results indicate that our proposed UniRef++ achieves state-of-the-art performance on RIS and RVOS, and performs competitively on FSS and VOS with a parameter-shared network. Moreover, we showcase that the proposed UniFusion module could be easily incorporated into the current advanced foundation model SAM and obtain satisfactory results with parameter-efficient finetuning. Codes and models are available at https://github.com/FoundationVision/UniRef.
PDF211December 15, 2024