利用生成後記憶檢索減少基於語言模型的SPARQL查詢生成中的幻覺現象

摘要

從自然語言問題生成SPARQL查詢的能力，對於確保從知識圖譜（KG）中高效且準確地檢索結構化數據至關重要。儘管大型語言模型（LLMs）已被廣泛應用於SPARQL查詢生成，但在基於內部參數知識生成如統一資源標識符（URIs）等KG元素時，它們往往容易出現幻覺和分佈外錯誤。這通常導致生成的內容看似合理，但實際上存在事實性錯誤，這對其在現實世界信息檢索（IR）應用中的使用構成了重大挑戰。這也促使了旨在檢測和緩解此類錯誤的研究日益增多。在本文中，我們介紹了PGMR（生成後記憶檢索），這是一個模塊化框架，它整合了一個非參數記憶模塊來檢索KG元素，並增強基於LLM的SPARQL查詢生成。我們的實驗結果表明，PGMR在多樣化的數據集、數據分佈和LLMs上均展現出穩定的強勁性能。值得注意的是，PGMR顯著減少了URI幻覺，在幾種場景下幾乎完全消除了這一問題。

English

The ability to generate SPARQL queries from natural language questions is crucial for ensuring efficient and accurate retrieval of structured data from knowledge graphs (KG). While large language models (LLMs) have been widely adopted for SPARQL query generation, they are often susceptible to hallucinations and out-of-distribution errors when producing KG elements like Uniform Resource Identifiers (URIs) based on internal parametric knowledge. This often results in content that appears plausible but is factually incorrect, posing significant challenges for their use in real-world information retrieval (IR) applications. This has led to increased research aimed at detecting and mitigating such errors. In this paper, we introduce PGMR (Post-Generation Memory Retrieval), a modular framework that incorporates a non-parametric memory module to retrieve KG elements and enhance LLM-based SPARQL query generation. Our experimental results indicate that PGMR consistently delivers strong performance across diverse datasets, data distributions, and LLMs. Notably, PGMR significantly mitigates URI hallucinations, nearly eliminating the problem in several scenarios.

利用生成後記憶檢索減少基於語言模型的SPARQL查詢生成中的幻覺現象

Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval

摘要

Support