大規模知識ベースにおける言語モデルの知識欠陥の発見

要旨

大規模言語モデル（LLM）は印象的な言語能力を有しているが、事実知識を忠実に保持することにしばしば失敗し、幻覚や信頼性の低い出力を引き起こす。完全なスケールの知識ベースに対して網羅的に評価を行うことでLLMの知識欠陥を理解することは、特にクローズドウェイトモデルにおいて計算上非現実的である。我々は、厳密なクエリ予算の下でクローズドウェイトLLMの知識欠陥（エラー）を発見するためのスケーラブルで効率的なフレームワークである確率的誤差上昇（SEA）を提案する。SEAは、すべての知識候補を単純に探査するのではなく、誤差発見を確率的最適化プロセスとして定式化する：以前に観察された失敗との意味的類似性を活用して、新しい高誤差候補を反復的に取得する。検索効率とカバレッジをさらに向上させるために、SEAはドキュメントレベルとパラグラフレベルでの階層的検索を採用し、誤差伝播をモデル化し系統的な失敗モードを特定するための関係有向非巡回グラフを構築する。実験的に、SEAはAutomated Capability Discoveryよりも40.7倍、AutoBencherよりも26.7%多くの知識エラーを発見し、エラーあたりのコストをそれぞれ599倍と9倍削減した。人間による評価は生成された質問の高品質を確認し、アブレーションと収束分析はSEAの各コンポーネントの貢献を検証した。発見されたエラーに関するさらなる分析は、LLMファミリー間での相関する失敗パターンと繰り返し発生する欠陥を明らかにし、将来のLLM開発におけるより良いデータカバレッジとターゲットを絞ったファインチューニングの必要性を強調している。

English

Large language models (LLMs) possess impressive linguistic capabilities but often fail to faithfully retain factual knowledge, leading to hallucinations and unreliable outputs. Understanding LLMs' knowledge deficiencies by exhaustively evaluating against full-scale knowledge bases is computationally prohibitive, especially for closed-weight models. We propose stochastic error ascent (SEA), a scalable and efficient framework for discovering knowledge deficiencies (errors) in closed-weight LLMs under a strict query budget. Rather than naively probing all knowledge candidates, SEA formulates error discovery as a stochastic optimization process: it iteratively retrieves new high-error candidates by leveraging the semantic similarity to previously observed failures. To further enhance search efficiency and coverage, SEA employs hierarchical retrieval across document and paragraph levels, and constructs a relation directed acyclic graph to model error propagation and identify systematic failure modes. Empirically, SEA uncovers 40.7x more knowledge errors than Automated Capability Discovery and 26.7% more than AutoBencher, while reducing the cost-per-error by 599x and 9x, respectively. Human evaluation confirms the high quality of generated questions, while ablation and convergence analyses validate the contribution of each component in SEA. Further analysis on the discovered errors reveals correlated failure patterns across LLM families and recurring deficits, highlighting the need for better data coverage and targeted fine-tuning in future LLM development.

大規模知識ベースにおける言語モデルの知識欠陥の発見

Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base

要旨

Support