從圖形視角探測大型語言模型中的知識結構模式

摘要

大型語言模型作為神經知識庫，其知識獲取、可編輯性、推理能力及可解釋性已得到廣泛研究。然而，鮮有工作聚焦於其知識的結構模式。基於這一研究空白，我們從圖的視角探討這些結構模式。我們在三元組和實體兩個層面上量化了大型語言模型的知識，並分析了其與圖結構屬性（如節點度）的關聯。此外，我們揭示了知識同質性現象，即拓撲上相近的實體展現出相似的知識水平，這進一步激勵我們開發基於圖的機器學習模型，利用實體的局部鄰居來估計其知識水平。該模型還能夠通過選擇大型語言模型較不熟悉的三元組來進行有價值的知識檢查。實驗結果表明，使用選定的三元組進行微調可帶來更優的性能。

English

Large language models have been extensively studied as neural knowledge bases for their knowledge access, editability, reasoning, and explainability. However, few works focus on the structural patterns of their knowledge. Motivated by this gap, we investigate these structural patterns from a graph perspective. We quantify the knowledge of LLMs at both the triplet and entity levels, and analyze how it relates to graph structural properties such as node degree. Furthermore, we uncover the knowledge homophily, where topologically close entities exhibit similar levels of knowledgeability, which further motivates us to develop graph machine learning models to estimate entity knowledge based on its local neighbors. This model further enables valuable knowledge checking by selecting triplets less known to LLMs. Empirical results show that using selected triplets for fine-tuning leads to superior performance.

從圖形視角探測大型語言模型中的知識結構模式

A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models

摘要

Support