ChatPaper.aiChatPaper

ROSE:面向检索的分割增强技术

ROSE: Retrieval-Oriented Segmentation Enhancement

April 15, 2026
作者: Song Tang, Guangquan Jie, Henghui Ding, Yu-Gang Jiang
cs.AI

摘要

现有基于多模态大语言模型(MLLM)的分割方法(如LISA)因无法融入最新知识,常难以处理新兴实体。为解决这一挑战,我们提出新兴实体分割任务(NEST),重点研究两类对象的划分:(i)因未出现在训练数据中而被MLLM误识别的全新实体;(ii)虽存在于模型知识库中但需借助最新外部信息才能准确识别的演进实体。为支持NEST研究,我们通过自动化流程构建了NEST基准数据集,该数据集生成与新闻相关的样本以实现全面评估。此外,我们提出即插即用框架ROSE:检索增强型分割优化器,可增强任何基于MLLM的分割模型。ROSE包含四个核心组件:首先引入互联网检索增强生成模块,利用用户提供的多模态输入实时获取网络信息;随后通过文本提示增强器注入最新资讯与丰富背景知识,提升模型对演进实体的感知能力;进一步设计视觉提示增强器,借助网络图像补偿MLLM对全新实体的认知缺失;为保持效率,引入网络感知模块智能判断何时触发检索机制。实验表明,ROSE在NEST基准上显著提升性能,以19.2的gIoU优势超越基于Gemini-2.0 Flash的强检索基线。
English
Existing segmentation models based on multimodal large language models (MLLMs), such as LISA, often struggle with novel or emerging entities due to their inability to incorporate up-to-date knowledge. To address this challenge, we introduce the Novel Emerging Segmentation Task (NEST), which focuses on segmenting (i) novel entities that MLLMs fail to recognize due to their absence from training data, and (ii) emerging entities that exist within the model's knowledge but demand up-to-date external information for accurate recognition. To support the study of NEST, we construct a NEST benchmark using an automated pipeline that generates news-related data samples for comprehensive evaluation. Additionally, we propose ROSE: Retrieval-Oriented Segmentation Enhancement, a plug-and-play framework designed to augment any MLLM-based segmentation model. ROSE comprises four key components. First, an Internet Retrieval-Augmented Generation module is introduced to employ user-provided multimodal inputs to retrieve real-time web information. Then, a Textual Prompt Enhancer enriches the model with up-to-date information and rich background knowledge, improving the model's perception ability for emerging entities. Furthermore, a Visual Prompt Enhancer is proposed to compensate for MLLMs' lack of exposure to novel entities by leveraging internet-sourced images. To maintain efficiency, a WebSense module is introduced to intelligently decide when to invoke retrieval mechanisms based on user input. Experimental results demonstrate that ROSE significantly boosts performance on the NEST benchmark, outperforming a strong Gemini-2.0 Flash-based retrieval baseline by 19.2 in gIoU.
PDF10April 17, 2026