ChatPaper.aiChatPaper

提升無監督影片實例分割效能:基於自動品質引導的自訓練方法

Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training

December 7, 2025
作者: Kaixuan Lu, Mehmet Onurcan Kaya, Dim P. Papadopoulos
cs.AI

摘要

影片實例分割(VIS)面臨著顯著的標註挑戰,因其需同時滿足像素級遮罩與時間一致性標籤的雙重要求。儘管近期如VideoCutLER等無監督方法透過合成資料消除了對光流的依賴,但仍受制於合成至真實領域的差距。我們提出AutoQ-VIS——一種基於品質引導自訓練的新型無監督框架,透過建立偽標籤生成與自動品質評估間的閉環系統,實現從合成影片到真實影片的漸進式適應。實驗結果顯示,該方法在YouTubeVIS-2019驗證集上達到52.6 AP_{50}的頂尖性能,較先前最佳方法VideoCutLER提升4.4%,且無需任何人為標註。這證實了品質感知自訓練在無監督VIS中的可行性。我們將於https://github.com/wcbup/AutoQ-VIS公開程式碼。
English
Video Instance Segmentation (VIS) faces significant annotation challenges due to its dual requirements of pixel-level masks and temporal consistency labels. While recent unsupervised methods like VideoCutLER eliminate optical flow dependencies through synthetic data, they remain constrained by the synthetic-to-real domain gap. We present AutoQ-VIS, a novel unsupervised framework that bridges this gap through quality-guided self-training. Our approach establishes a closed-loop system between pseudo-label generation and automatic quality assessment, enabling progressive adaptation from synthetic to real videos. Experiments demonstrate state-of-the-art performance with 52.6 AP_{50} on YouTubeVIS-2019 val set, surpassing the previous state-of-the-art VideoCutLER by 4.4%, while requiring no human annotations. This demonstrates the viability of quality-aware self-training for unsupervised VIS. We will release the code at https://github.com/wcbup/AutoQ-VIS.
PDF121December 11, 2025