ChatPaper.aiChatPaper

ROSE:面向检索的分割增强技术

ROSE: Retrieval-Oriented Segmentation Enhancement

April 15, 2026
作者: Song Tang, Guangquan Jie, Henghui Ding, Yu-Gang Jiang
cs.AI

摘要

当前基于多模态大语言模型(MLLM)的分割模型(如LISA)往往难以处理新兴实体,因其无法整合最新知识。为解决这一挑战,我们提出新兴实体分割任务(NEST),重点研究两类对象的划分:(i)因未出现在训练数据中而被MLLM误识的未知实体;(ii)虽存在于模型知识库中但需借助实时外部信息才能准确识别的演进实体。为支持NEST研究,我们通过自动化流程构建了NEST基准数据集,该数据集生成与新闻相关的样本以实现全面评估。此外,我们提出检索导向的分割增强框架ROSE:该即插即用方案可增强任何基于MLLM的分割模型。ROSE包含四个核心组件:首先,互联网检索增强生成模块利用用户提供的多模态输入获取实时网络信息;其次,文本提示增强器通过最新信息和丰富背景知识提升模型对演进实体的感知能力;再者,视觉提示增强器借助网络图像补偿MLLM对未知实体认知的不足;为保持效率,WebSense模块可根据用户输入智能决策是否触发检索机制。实验结果表明,ROSE在NEST基准上的性能显著提升,以19.2的gIoU优势超越基于Gemini-2.0 Flash的强检索基线。
English
Existing segmentation models based on multimodal large language models (MLLMs), such as LISA, often struggle with novel or emerging entities due to their inability to incorporate up-to-date knowledge. To address this challenge, we introduce the Novel Emerging Segmentation Task (NEST), which focuses on segmenting (i) novel entities that MLLMs fail to recognize due to their absence from training data, and (ii) emerging entities that exist within the model's knowledge but demand up-to-date external information for accurate recognition. To support the study of NEST, we construct a NEST benchmark using an automated pipeline that generates news-related data samples for comprehensive evaluation. Additionally, we propose ROSE: Retrieval-Oriented Segmentation Enhancement, a plug-and-play framework designed to augment any MLLM-based segmentation model. ROSE comprises four key components. First, an Internet Retrieval-Augmented Generation module is introduced to employ user-provided multimodal inputs to retrieve real-time web information. Then, a Textual Prompt Enhancer enriches the model with up-to-date information and rich background knowledge, improving the model's perception ability for emerging entities. Furthermore, a Visual Prompt Enhancer is proposed to compensate for MLLMs' lack of exposure to novel entities by leveraging internet-sourced images. To maintain efficiency, a WebSense module is introduced to intelligently decide when to invoke retrieval mechanisms based on user input. Experimental results demonstrate that ROSE significantly boosts performance on the NEST benchmark, outperforming a strong Gemini-2.0 Flash-based retrieval baseline by 19.2 in gIoU.
PDF10April 17, 2026