ChatPaper.aiChatPaper

PhotoBench:超越视觉匹配,迈向个性化意图驱动的照片检索

PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval

March 2, 2026
作者: Tianyi Xu, Rong Shan, Junjie Wu, Jiadeng Huang, Teng Wang, Jiachen Zhu, Wenteng Chen, Minxin Tu, Quantao Dou, Zhaoxiang Wang, Changwang Zhang, Weinan Zhang, Jun Wang, Jianghao Lin
cs.AI

摘要

個人相簿並非靜態圖像的簡單集合,而是由時間連續性、社會關聯性與豐富元數據定義的動態生態檔案,這使得個性化圖像檢索成為一項複雜任務。然而現有檢索基準過度依賴情境孤立的網絡快照,無法捕捉解決真實意圖驅動查詢所需的多源推理能力。為彌合這一鴻溝,我們推出首個基於真實個人相簿構建的基準數據集PhotoBench,旨在將研究範式從視覺匹配轉向個性化的多源意圖驅動推理。通過建立嚴謹的多源畫像框架——整合每張圖像的視覺語義、時空元數據、社交身份與時間事件,我們基於用戶生命軌跡合成出複雜的意圖驅動查詢。在PhotoBench上的廣泛評估揭示了兩大關鍵局限:其一是模態鴻溝,即統一嵌入模型在處理非視覺約束時失效;其二是源融合悖論,體現為智能體系統難以有效協調多工具協作。這些發現表明,個人多模態檢索的下一個前沿應超越統一嵌入模型,需要構建具備精確約束滿足能力與多源融合能力的強健智能體推理系統。PhotoBench數據集已開放使用。
English
Personal photo albums are not merely collections of static images but living, ecological archives defined by temporal continuity, social entanglement, and rich metadata, which makes the personalized photo retrieval non-trivial. However, existing retrieval benchmarks rely heavily on context-isolated web snapshots, failing to capture the multi-source reasoning required to resolve authentic, intent-driven user queries. To bridge this gap, we introduce PhotoBench, the first benchmark constructed from authentic, personal albums. It is designed to shift the paradigm from visual matching to personalized multi-source intent-driven reasoning. Based on a rigorous multi-source profiling framework, which integrates visual semantics, spatial-temporal metadata, social identity, and temporal events for each image, we synthesize complex intent-driven queries rooted in users' life trajectories. Extensive evaluation on PhotoBench exposes two critical limitations: the modality gap, where unified embedding models collapse on non-visual constraints, and the source fusion paradox, where agentic systems perform poor tool orchestration. These findings indicate that the next frontier in personal multimodal retrieval lies beyond unified embeddings, necessitating robust agentic reasoning systems capable of precise constraint satisfaction and multi-source fusion. Our PhotoBench is available.
PDF182March 4, 2026