InterLV-Search:交錯多模態代理式搜尋的基準測試
InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
May 8, 2026
作者: Bohan Hou, Jiuning Gu, Jiayan Guo, Ronghao Dang, Sicong Leng, Xin Li, Xuemeng Song, Jianfei Yang
cs.AI
摘要
现有针对多模态智能体搜索的基准测试虽能评估多模态搜索与视觉浏览能力,但视觉证据要么仅局限于输入阶段,要么被视为答案终点,而非交织型搜索轨迹的组成部分。为此,我们提出InterLV-Search基准,用于评估交织式语言-视觉智能体搜索——其中文本与视觉证据被反复用于指导后续搜索。该基准包含2061个样本,覆盖三个层级:主动视觉证据检索、受控离线交织型多模态搜索、以及开放式网络交织型多模态搜索。相较于现有基准,本基准还包含多模态多分支样本,要求在多实体比较过程中进行证据检索。我们通过自动化流水线构建了层级1和层级2,并通过机器主导、人工监督的开放式网络流水线构建了层级3。同时提供InterLV-Agent以实现标准化工具调用、轨迹记录与评估。基于商用及开源多模态智能体的实验表明,当前系统远未解决交织型多模态搜索问题,最优模型整体准确率不足50%,凸显出在视觉证据检索、搜索控制及多模态证据整合方面的挑战。基准数据与评估代码已发布于https://github.com/hbhalpha/InterLV-Search-Bench。
English
Existing benchmarks for multimodal agentic search evaluate multimodal search and visual browsing, but visual evidence is either confined to the input or treated as an answer endpoint rather than part of an interleaved search trajectory. We introduce InterLV-Search, a benchmark for Interleaved Language-Vision Agentic Search, in which textual and visual evidence is repeatedly used to condition later search. It contains 2,061 examples across three levels: active visual evidence seeking, controlled offline interleaved multimodal search, and open-web interleaved multimodal search. Beyond existing benchmarks, it also includes multimodal multi-branch samples that involve comparison between multiple entities during the evidence search. We construct Level 1 and Level 2 with automated pipelines and Level 3 with a machine-led, human-supervised open-web pipeline. We further provide InterLV-Agent for standardized tool use, trajectory logging, and evaluation. Experiments on proprietary and open-source multimodal agents show that current systems remain far from solving interleaved multimodal search, with the best model below 50% overall accuracy, highlighting challenges in visual evidence seeking, search control, and multimodal evidence integration. We release the benchmark data and evaluation code at https://github.com/hbhalpha/InterLV-Search-Bench