从图论视角探究大型语言模型中的知识结构模式

摘要

大型语言模型作为神经知识库，其知识获取、可编辑性、推理能力及可解释性已得到广泛研究。然而，针对其知识结构模式的研究却相对匮乏。基于此，我们从图论视角出发，深入探究了这些结构模式。我们分别在三元组和实体层面量化了大型语言模型的知识，并分析了其与图结构属性（如节点度数）的关联。此外，我们揭示了知识同质性现象，即拓扑结构相近的实体展现出相似的知识水平，这进一步激励我们开发图机器学习模型，通过实体局部邻居来估计其知识水平。该模型还能通过筛选大型语言模型较不熟悉的三元组，实现有价值的知识校验。实证结果表明，利用筛选出的三元组进行微调，能显著提升模型性能。

English

Large language models have been extensively studied as neural knowledge bases for their knowledge access, editability, reasoning, and explainability. However, few works focus on the structural patterns of their knowledge. Motivated by this gap, we investigate these structural patterns from a graph perspective. We quantify the knowledge of LLMs at both the triplet and entity levels, and analyze how it relates to graph structural properties such as node degree. Furthermore, we uncover the knowledge homophily, where topologically close entities exhibit similar levels of knowledgeability, which further motivates us to develop graph machine learning models to estimate entity knowledge based on its local neighbors. This model further enables valuable knowledge checking by selecting triplets less known to LLMs. Empirical results show that using selected triplets for fine-tuning leads to superior performance.

从图论视角探究大型语言模型中的知识结构模式

A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models

摘要

Support