UniRef++:在空間與時間維度中分割每個參考物體
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
December 25, 2023
作者: Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo
cs.AI
摘要
基於參考的物件分割任務,包括參考影像分割(RIS)、少樣本影像分割(FSS)、參考影片物件分割(RVOS)及影片物件分割(VOS),旨在透過語言或標註遮罩作為參考來分割特定物件。儘管各領域已取得顯著進展,現有方法仍針對特定任務獨立設計與發展,阻礙了這些任務在多任務能力上的協同發展。本研究終結當前碎片化局面,提出UniRef++以單一架構統一四種基於參考的物件分割任務。我們方法的核心是提出的UniFusion模組,該模組能執行多路徑融合以處理不同任務對應的特定參考。隨後採用統一的Transformer架構實現實例級分割。透過統一設計,UniRef++可在廣泛基準上進行聯合訓練,並能在運行時透過指定對應參考靈活完成多項任務。我們在多種基準上評估統一模型,大量實驗結果表明:UniRef++在RIS和RVOS任務上達到最先進性能,並以參數共享網絡在FSS和VOS任務上展現競爭力。此外,我們驗證了所提UniFusion模組可輕鬆整合至當前先進基礎模型SAM中,透過參數高效微調即可獲得令人滿意的結果。程式碼與模型已開源於:https://github.com/FoundationVision/UniRef。
English
The reference-based object segmentation tasks, namely referring image
segmentation (RIS), few-shot image segmentation (FSS), referring video object
segmentation (RVOS), and video object segmentation (VOS), aim to segment a
specific object by utilizing either language or annotated masks as references.
Despite significant progress in each respective field, current methods are
task-specifically designed and developed in different directions, which hinders
the activation of multi-task capabilities for these tasks. In this work, we end
the current fragmented situation and propose UniRef++ to unify the four
reference-based object segmentation tasks with a single architecture. At the
heart of our approach is the proposed UniFusion module which performs
multiway-fusion for handling different tasks with respect to their specified
references. And a unified Transformer architecture is then adopted for
achieving instance-level segmentation. With the unified designs, UniRef++ can
be jointly trained on a broad range of benchmarks and can flexibly complete
multiple tasks at run-time by specifying the corresponding references. We
evaluate our unified models on various benchmarks. Extensive experimental
results indicate that our proposed UniRef++ achieves state-of-the-art
performance on RIS and RVOS, and performs competitively on FSS and VOS with a
parameter-shared network. Moreover, we showcase that the proposed UniFusion
module could be easily incorporated into the current advanced foundation model
SAM and obtain satisfactory results with parameter-efficient finetuning. Codes
and models are available at https://github.com/FoundationVision/UniRef.