言語モデルベースのSPARQLクエリ生成における幻覚を低減するポスト生成メモリ検索の活用

要旨

自然言語の質問からSPARQLクエリを生成する能力は、知識グラフ（KG）から構造化データを効率的かつ正確に取得するために極めて重要です。大規模言語モデル（LLM）はSPARQLクエリ生成に広く採用されていますが、内部のパラメトリック知識に基づいてUniform Resource Identifier（URI）などのKG要素を生成する際に、幻覚や分布外エラーが発生しやすい傾向があります。これにより、一見もっともらしいが事実上誤った内容が生成され、現実世界の情報検索（IR）アプリケーションでの使用に重大な課題を引き起こしています。この問題に対処するため、そのようなエラーを検出し軽減することを目的とした研究が増えています。本論文では、PGMR（Post-Generation Memory Retrieval）を紹介します。これは、非パラメトリックなメモリモジュールを組み込むことでKG要素を取得し、LLMベースのSPARQLクエリ生成を強化するモジュール型フレームワークです。実験結果から、PGMRは多様なデータセット、データ分布、およびLLMにおいて一貫して優れた性能を発揮することが示されています。特に、PGMRはURIの幻覚を大幅に軽減し、いくつかのシナリオではほぼ完全に問題を解消しています。

English

The ability to generate SPARQL queries from natural language questions is crucial for ensuring efficient and accurate retrieval of structured data from knowledge graphs (KG). While large language models (LLMs) have been widely adopted for SPARQL query generation, they are often susceptible to hallucinations and out-of-distribution errors when producing KG elements like Uniform Resource Identifiers (URIs) based on internal parametric knowledge. This often results in content that appears plausible but is factually incorrect, posing significant challenges for their use in real-world information retrieval (IR) applications. This has led to increased research aimed at detecting and mitigating such errors. In this paper, we introduce PGMR (Post-Generation Memory Retrieval), a modular framework that incorporates a non-parametric memory module to retrieve KG elements and enhance LLM-based SPARQL query generation. Our experimental results indicate that PGMR consistently delivers strong performance across diverse datasets, data distributions, and LLMs. Notably, PGMR significantly mitigates URI hallucinations, nearly eliminating the problem in several scenarios.

言語モデルベースのSPARQLクエリ生成における幻覚を低減するポスト生成メモリ検索の活用

Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval

要旨

Support