OmniRetrieval：跨异构知识源的统一检索

摘要

现实世界的信息需求要求访问结构多样的知识源，从非结构化文本、关系表格到知识图谱和属性图。然而，现有的检索器一次只能在一个查询语言的固定格式下操作单一知识源，导致可用知识的广阔图景被不兼容的接口割裂。试图统一这些知识源的自然做法是将它们压缩到共享空间中，但这会抹去每种知识源独有的结构优势（如模式、本体、组合操作算子），而这些正是赋予其表达能力的核心。因此，对多样化知识的有效检索并非要求同质化，而是需要一个能尊重每种知识源自身特性的统领层。为此，我们提出了OmniRetrieval框架，它可接收任意自然语言查询，识别合适的知识源，并将源原生查询分派至其原生执行引擎。在涵盖文本、关系和图结构知识源、包含13个数据集和309个不同知识库的广泛基准测试中，OmniRetrieval超越了单源基线方法，证明它能够作为异构知识源的通用接口，同时保留每种知识源的结构差异所赋予的价值。

English

Real-world information needs require access to structurally diverse knowledge sources, from unstructured text and relational tables to knowledge graphs and property graphs. Existing retrievers, however, operate over one source at a time under a fixed query language, leaving the broader landscape of available knowledge fragmented behind incompatible interfaces. A natural attempt at unification would collapse these sources into a shared space, but this erases the structural affordances (such as schemas, ontologies, compositional operators) that give each source its expressive power. Effective retrieval over diverse knowledge, therefore, requires not homogenization but an overarching layer that meets each source on its own terms. To achieve this, we present OmniRetrieval, a framework that takes any natural-language query, identifies appropriate knowledge sources, and dispatches source-native queries to their native execution engines. Across an extensive benchmark spanning 13 datasets and 309 distinct knowledge bases over text, relational, and graph-structured sources, OmniRetrieval exceeds single-source baselines, demonstrating that it can serve as a general-purpose interface to the heterogeneous sources while preserving the structural distinctions that make each source valuable.