Wikontic:基于大语言模型构建维基数据对齐的语义感知知识图谱
Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models
November 29, 2025
作者: Alla Chepurova, Aydar Bulatov, Yuri Kuratov, Mikhail Burtsev
cs.AI
摘要
知识图谱(KGs)为大型语言模型(LLMs)提供了结构化、可验证的基础支撑,但当前基于LLM的系统通常仅将KGs作为文本检索的辅助结构,未能充分挖掘其内在质量。本文提出Wikontic——一种多阶段处理流程,通过从开放域文本中提取带有限定符的候选三元组、实施基于Wikidata的类型与关系约束、并对实体进行归一化以减少重复,从而构建知识图谱。所得知识图谱具有紧凑性、本体一致性和良好连通性:在MuSiQue数据集上,正确答案实体在96%的生成三元组中出现。在HotpotQA任务中,我们仅使用三元组的设置达到76.0 F1值,在MuSiQue上达到59.8 F1值,匹配或超越了仍需文本语境的多种检索增强生成基线方法。此外,Wikontic在MINE-1基准测试中实现了最先进的信息保留性能(86%),优于现有知识图谱构建方法。该流程在构建时亦具高效性:知识图谱构建消耗少于1,000个输出标记,比AriGraph减少约三分之二,仅为GraphRAG的1/20。所提出的流程提升了生成知识图谱的质量,为LLMs中结构化知识的利用提供了可扩展的解决方案。
English
Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored. In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain text by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets. On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention performance on the MINE-1 benchmark (86%), outperforming prior KG construction methods. Wikontic is also efficient at build time: KG construction uses less than 1,000 output tokens, about 3times fewer than AriGraph and <1/20 of GraphRAG. The proposed pipeline enhances the quality of the generated KG and offers a scalable solution for leveraging structured knowledge in LLMs.