ChatPaper.aiChatPaper

PhotoBench:超越视觉匹配,迈向个性化意图驱动的照片检索

PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval

March 2, 2026
作者: Tianyi Xu, Rong Shan, Junjie Wu, Jiadeng Huang, Teng Wang, Jiachen Zhu, Wenteng Chen, Minxin Tu, Quantao Dou, Zhaoxiang Wang, Changwang Zhang, Weinan Zhang, Jun Wang, Jianghao Lin
cs.AI

摘要

个人相册并非静态图像的简单集合,而是具有时间连续性、社会关联性和丰富元数据的动态生态档案,这使得个性化图像检索成为一项复杂任务。然而现有检索基准过度依赖脱离语境的网络快照,无法捕捉解决真实用户意图驱动查询所需的多源推理能力。为弥补这一缺陷,我们推出首个基于真实个人相册构建的基准数据集PhotoBench,旨在将研究范式从视觉匹配转向个性化多源意图推理。通过建立严谨的多源画像框架——整合每张图像的视觉语义、时空元数据、社交身份与时间事件——我们基于用户生命轨迹合成了复杂的意图驱动查询。在PhotoBench上的大量实验揭示了两大关键局限:其一是模态鸿沟,即统一嵌入模型在非视觉约束条件下失效;其二是源融合悖论,即智能体系统难以有效协调多工具协作。这些发现表明,个人多模态检索的下一个前沿在于突破统一嵌入范式,需要构建能够精确满足约束条件并实现多源融合的强健智能体推理系统。PhotoBench已开放使用。
English
Personal photo albums are not merely collections of static images but living, ecological archives defined by temporal continuity, social entanglement, and rich metadata, which makes the personalized photo retrieval non-trivial. However, existing retrieval benchmarks rely heavily on context-isolated web snapshots, failing to capture the multi-source reasoning required to resolve authentic, intent-driven user queries. To bridge this gap, we introduce PhotoBench, the first benchmark constructed from authentic, personal albums. It is designed to shift the paradigm from visual matching to personalized multi-source intent-driven reasoning. Based on a rigorous multi-source profiling framework, which integrates visual semantics, spatial-temporal metadata, social identity, and temporal events for each image, we synthesize complex intent-driven queries rooted in users' life trajectories. Extensive evaluation on PhotoBench exposes two critical limitations: the modality gap, where unified embedding models collapse on non-visual constraints, and the source fusion paradox, where agentic systems perform poor tool orchestration. These findings indicate that the next frontier in personal multimodal retrieval lies beyond unified embeddings, necessitating robust agentic reasoning systems capable of precise constraint satisfaction and multi-source fusion. Our PhotoBench is available.
PDF182March 4, 2026