ChatPaper.aiChatPaper

提升无监督视频实例分割质量:基于自动质量引导的自训练方法

Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training

December 7, 2025
作者: Kaixuan Lu, Mehmet Onurcan Kaya, Dim P. Papadopoulos
cs.AI

摘要

视频实例分割(VIS)面临显著的标注挑战,因其需要同时满足像素级掩码和时序一致性标注的要求。尽管近期如VideoCutLER等无监督方法通过合成数据消除了对光流的依赖,但仍受限于合成数据与真实场景之间的领域差异。我们提出AutoQ-VIS——一种通过质量引导自训练来弥合这一差异的新型无监督框架。该方法在伪标签生成与自动质量评估之间建立闭环系统,实现从合成视频到真实视频的渐进式适应。实验表明,在YouTubeVIS-2019验证集上取得了52.6 AP_{50}的顶尖性能,较之前最优的VideoCutLER提升4.4%,且无需人工标注。这证明了质量感知自训练在无监督VIS中的可行性。代码将在https://github.com/wcbup/AutoQ-VIS发布。
English
Video Instance Segmentation (VIS) faces significant annotation challenges due to its dual requirements of pixel-level masks and temporal consistency labels. While recent unsupervised methods like VideoCutLER eliminate optical flow dependencies through synthetic data, they remain constrained by the synthetic-to-real domain gap. We present AutoQ-VIS, a novel unsupervised framework that bridges this gap through quality-guided self-training. Our approach establishes a closed-loop system between pseudo-label generation and automatic quality assessment, enabling progressive adaptation from synthetic to real videos. Experiments demonstrate state-of-the-art performance with 52.6 AP_{50} on YouTubeVIS-2019 val set, surpassing the previous state-of-the-art VideoCutLER by 4.4%, while requiring no human annotations. This demonstrates the viability of quality-aware self-training for unsupervised VIS. We will release the code at https://github.com/wcbup/AutoQ-VIS.
PDF121December 11, 2025