全知檢索：跨異質知識來源的統一檢索

摘要

现实世界中的信息需求需要访问结构多样的知识源，从非结构化文本、关系型表格到知识图谱和属性图。然而，现有的检索器一次只能基于固定的查询语言操作单一知识源，导致可用知识的广阔版图因不兼容的接口而支离破碎。将所有这些知识源统一到一个共享空间看似可行，但这会抹去每种知识源的结构效能（如模式、本体、组合运算符），而这些正是赋予它们表达力的关键。因此，高效检索多样化知识并非要求同质化，而是需要一个涵盖各知识源、并能按其自身逻辑与之对接的顶层框架。为此，我们提出了OmniRetrieval——一种框架，它能够接受任意自然语言查询，识别合适的知识源，并将原生查询派发至相应的执行引擎。在涵盖13个数据集、309个不同知识库（涵盖文本、关系型和图结构知识源）的广泛基准测试中，OmniRetrieval超越了单知识源基线方法的性能，表明它能够作为异构知识源的通用接口，同时保留每种知识源宝贵结构差异。

English

Real-world information needs require access to structurally diverse knowledge sources, from unstructured text and relational tables to knowledge graphs and property graphs. Existing retrievers, however, operate over one source at a time under a fixed query language, leaving the broader landscape of available knowledge fragmented behind incompatible interfaces. A natural attempt at unification would collapse these sources into a shared space, but this erases the structural affordances (such as schemas, ontologies, compositional operators) that give each source its expressive power. Effective retrieval over diverse knowledge, therefore, requires not homogenization but an overarching layer that meets each source on its own terms. To achieve this, we present OmniRetrieval, a framework that takes any natural-language query, identifies appropriate knowledge sources, and dispatches source-native queries to their native execution engines. Across an extensive benchmark spanning 13 datasets and 309 distinct knowledge bases over text, relational, and graph-structured sources, OmniRetrieval exceeds single-source baselines, demonstrating that it can serve as a general-purpose interface to the heterogeneous sources while preserving the structural distinctions that make each source valuable.