Wikontic:基于大语言模型构建维基数据对齐的ontology感知知识图谱
Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models
November 29, 2025
作者: Alla Chepurova, Aydar Bulatov, Yuri Kuratov, Mikhail Burtsev
cs.AI
摘要
知識圖譜(KGs)為大型語言模型(LLMs)提供了結構化、可驗證的基礎,但當前基於LLM的系統通常僅將KGs作為文本檢索的輔助結構,未能充分挖掘其內在質量。本研究提出Wikontic——一個多階段處理流程,通過從開放域文本中提取帶有限定符的候選三元組、實施基於Wikidata的類型與關係約束,以及規範化實體以減少冗餘,從而構建KGs。所得KGs具有緊湊性、本體一致性和高連通性特點:在MuSiQue數據集中,正確答案實體出現在96%的生成三元組中。在HotpotQA上,我們僅使用三元組的設定達到76.0 F1值,在MuSiQue上達到59.8 F1值,匹配或超越了仍需依賴文本語境的若干檢索增強生成基線模型。此外,Wikontic在MINE-1基準測試中實現了最先進的信息保留性能(86%),優於現有KG構建方法。Wikontic在構建效率方面表現突出:KG構建過程消耗少於1,000個輸出標記,約為AriGraph的1/3,不足GraphRAG的1/20。該流程顯著提升了生成KG的質量,為LLMs中結構化知識的規模化應用提供了可行方案。
English
Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored. In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain text by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets. On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention performance on the MINE-1 benchmark (86%), outperforming prior KG construction methods. Wikontic is also efficient at build time: KG construction uses less than 1,000 output tokens, about 3times fewer than AriGraph and <1/20 of GraphRAG. The proposed pipeline enhances the quality of the generated KG and offers a scalable solution for leveraging structured knowledge in LLMs.