ChatPaper.aiChatPaper

揭示潜在陷阱:从任务中心视角导航下一代向量相似性搜索

Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

December 15, 2025
作者: Tingyang Chen, Cong Fu, Jiahua Wu, Haotian Wu, Hua Fan, Xiangyu Ke, Yunjun Gao, Yabo Ni, Anxiang Zeng
cs.AI

摘要

高维向量相似性搜索正迅速成为下一代数据库系统的核心功能,服务于各类数据密集型应用——从大语言模型中的嵌入查询,到语义信息检索与推荐系统。然而现有基准测试主要围绕召回率与延迟的权衡关系展开评估,其真值标准仅依赖于距离度量,未能考量检索质量对下游任务的最终影响。这种脱节可能误导学术研究与产业实践。 我们提出Iceberg——面向实际应用场景的端到端向量相似性搜索方法评估基准套件。基于任务中心视角,Iceberg揭示了"信息损失漏斗"现象,识别出导致端到端性能下降的三大主因:(1) 特征提取过程中的嵌入损失;(2) 距离度量与任务相关性失配的指标误用;(3) 凸显索引鲁棒性不足的数据分布敏感性。为进行全面评估,Iceberg涵盖图像分类、人脸识别、文本检索、推荐系统等关键领域的八个多样化数据集。每个包含100万至1亿向量的数据集均配备丰富的任务专属标签与评估指标,支持在完整应用流程中(而非孤立环境下)评估检索算法。 通过对13种前沿向量搜索方法进行基准测试,并基于应用级指标重新排序,Iceberg发现其与传统仅依赖召回率-延迟的评估排名存在显著差异。基于这些发现,我们定义了一组任务中心元特征,并推导出可解释的决策树,为从业者根据具体工作负载选择与调优向量搜索方法提供指导。
English
Vector Similarity Search (VSS) in high-dimensional spaces is rapidly emerging as core functionality in next-generation database systems for numerous data-intensive services -- from embedding lookups in large language models (LLMs), to semantic information retrieval and recommendation engines. Current benchmarks, however, evaluate VSS primarily on the recall-latency trade-off against a ground truth defined solely by distance metrics, neglecting how retrieval quality ultimately impacts downstream tasks. This disconnect can mislead both academic research and industrial practice. We present Iceberg, a holistic benchmark suite for end-to-end evaluation of VSS methods in realistic application contexts. From a task-centric view, Iceberg uncovers the Information Loss Funnel, which identifies three principal sources of end-to-end performance degradation: (1) Embedding Loss during feature extraction; (2) Metric Misuse, where distances poorly reflect task relevance; (3) Data Distribution Sensitivity, highlighting index robustness across skews and modalities. For a more comprehensive assessment, Iceberg spans eight diverse datasets across key domains such as image classification, face recognition, text retrieval, and recommendation systems. Each dataset, ranging from 1M to 100M vectors, includes rich, task-specific labels and evaluation metrics, enabling assessment of retrieval algorithms within the full application pipeline rather than in isolation. Iceberg benchmarks 13 state-of-the-art VSS methods and re-ranks them based on application-level metrics, revealing substantial deviations from traditional rankings derived purely from recall-latency evaluations. Building on these insights, we define a set of task-centric meta-features and derive an interpretable decision tree to guide practitioners in selecting and tuning VSS methods for their specific workloads.
PDF261December 18, 2025