ChatPaper.aiChatPaper

LLM引导的层次化检索

LLM-guided Hierarchical Retrieval

October 15, 2025
作者: Nilesh Gupta, Wei-Cheng Chang, Ngot Bui, Cho-Jui Hsieh, Inderjit S. Dhillon
cs.AI

摘要

现代信息检索系统日益面临处理复杂、多层面查询的挑战,这些查询需要深度推理而非简单的关键词或语义匹配。尽管基于大语言模型(LLM)的信息检索展现出巨大潜力,但主流的“检索-再排序”范式继承了基于嵌入检索的局限性;参数化生成方法难以更新新信息;而将整个语料库置于上下文中的长上下文方法对于大规模文档集合在计算上不可行。为应对这些挑战,我们提出了LATTICE,一种层次化检索框架,通过在语料库上构建语义树结构,使LLM能够以对数搜索复杂度对大规模语料库进行推理和导航。我们的方法包含两个阶段:(1)离线阶段,通过自底向上聚合策略或自顶向下分割策略,利用多级摘要将语料库组织成语义层次结构;(2)在线遍历阶段,搜索LLM在此树结构中进行导航。此类LLM引导搜索的一个核心挑战在于模型的相关性判断存在噪声、依赖上下文且对层次结构无感知,导致跨分支和跨层级比较困难。为此,我们提出了一种遍历算法,该算法从局部LLM输出中估计校准的潜在相关性分数,并将其聚合为全局路径相关性度量。我们的无需训练框架在推理密集型的BRIGHT基准测试中实现了最先进的零样本性能,在Recall@100上比次优零样本基线提升了9%,在nDCG@10上提升了5%。此外,与经过微调的SOTA方法DIVER-v2相比,LATTICE在使用静态语料库进行评估的BRIGHT子集上取得了相当的结果。
English
Modern IR systems are increasingly tasked with answering complex, multi-faceted queries that require deep reasoning rather than simple keyword or semantic matching. While LLM-based IR has shown great promise, the prevailing retrieve-then-rerank paradigm inherits the limitations of embedding-based retrieval; parametric generative approaches are difficult to update with new information; and long-context methods that place the entire corpus in context are computationally infeasible for large document collections. To address these challenges, we introduce LATTICE, a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity by imposing a semantic tree structure on the corpus. Our approach consists of two stages: (1) an offline phase that organizes the corpus into a semantic hierarchy via either a bottom-up agglomerative strategy or a top-down divisive strategy using multi-level summaries and (2) an online traversal phase where a search LLM navigates this tree. A central challenge in such LLM-guided search is that the model's relevance judgments are noisy, context-dependent, and unaware of the hierarchy, making cross-branch and cross-level comparisons difficult. To overcome this, we propose a traversal algorithm that estimates calibrated latent relevance scores from local LLM outputs and aggregates them into a global path relevance metric. Our training-free framework achieves state-of-the-art zero-shot performance on the reasoning-intensive BRIGHT benchmark, demonstrating up to 9% improvement in Recall@100 and 5% in nDCG@10 over the next best zero-shot baseline. Furthermore, compared to the fine-tuned SOTA method DIVER-v2, LATTICE attains comparable results on BRIGHT subsets that use a static corpus for evaluation.
PDF142October 17, 2025