Seg-ReSearch：融合推理与外部搜索的分割方法

摘要

基于语言的图像分割一直是计算机视觉领域的热门研究方向。尽管多模态大语言模型（MLLMs）的最新进展为分割系统赋予了推理能力，但这些尝试仍受限于MLLMs固有的固化知识体系，难以适应需要实时信息或领域特定概念的现实场景。本研究提出Seg-ReSearch这一新型分割范式，通过交织推理与外部检索突破现有方法的知识瓶颈。该范式使分割系统能够处理超越MLLMs固化知识范畴的动态开放世界查询任务。为有效训练这种能力，我们设计了分层奖励机制，将初始引导与渐进激励相融合，缓解了稀疏结果信号与僵化分步监督之间的冲突。针对评估需求，我们构建了OK-VOS基准数据集，该数据集明确要求视频对象分割任务具备外部知识支持。在OK-VOS和两个现有推理分割基准上的实验表明，Seg-ReSearch显著提升了先进方法的性能。相关代码与数据将在https://github.com/iSEE-Laboratory/Seg-ReSearch 发布。

English

Segmentation based on language has been a popular topic in computer vision. While recent advances in multimodal large language models (MLLMs) have endowed segmentation systems with reasoning capabilities, these efforts remain confined by the frozen internal knowledge of MLLMs, which limits their potential for real-world scenarios that involve up-to-date information or domain-specific concepts. In this work, we propose Seg-ReSearch, a novel segmentation paradigm that overcomes the knowledge bottleneck of existing approaches. By enabling interleaved reasoning and external search, Seg-ReSearch empowers segmentation systems to handle dynamic, open-world queries that extend beyond the frozen knowledge of MLLMs. To effectively train this capability, we introduce a hierarchical reward design that harmonizes initial guidance with progressive incentives, mitigating the dilemma between sparse outcome signals and rigid step-wise supervision. For evaluation, we construct OK-VOS, a challenging benchmark that explicitly requires outside knowledge for video object segmentation. Experiments on OK-VOS and two existing reasoning segmentation benchmarks demonstrate that our Seg-ReSearch improves state-of-the-art approaches by a substantial margin. Code and data will be released at https://github.com/iSEE-Laboratory/Seg-ReSearch.

Seg-ReSearch：融合推理与外部搜索的分割方法

Seg-ReSearch: Segmentation with Interleaved Reasoning and External Search

摘要

Support