Inside-Out: 大規模言語モデルに内在する事実知識

要旨

本研究は、大規模言語モデル（LLM）がその出力に表す以上の事実知識をパラメータ内に符号化しているかどうかを評価するためのフレームワークを提示する。この可能性を示唆する研究はいくつか存在するが、この現象を明確に定義し実証したものはない。まず、知識を形式的に定義し、与えられた質問に対する正解と不正解のペアにおいて正解がより高い順位にランクされる割合として定量化する。これにより、個々の回答候補をスコアリングする際に使用する情報に応じて、外部知識と内部知識が生じる。前者はモデルの観測可能なトークンレベルの確率、後者は中間計算に基づく。内部知識が外部知識を上回る場合、隠れた知識が生じる。次に、このフレームワークを3つの人気のあるオープンウェイトLLMに適用したケーススタディを提示し、クローズドブックQA設定で検証する。結果は以下の通りである：（1）LLMは一貫して、外部に表す以上の事実知識を内部に符号化しており、その平均ギャップは40％に及ぶ。（2）驚くべきことに、一部の知識は非常に深く隠されており、モデルが内部的には完璧に答えを知っているにもかかわらず、1,000回の大規模な繰り返しサンプリングを行っても一度も生成できない場合がある。これはLLMの生成能力における根本的な限界を明らかにしており、（3）クローズドブックQAにおけるテスト時の計算リソースのスケーリング（繰り返し回答サンプリング）に実用的な制約を課す。なぜなら、一部の回答は実質的にサンプリングされないため、性能向上の可能性が閉ざされているが、もしそれらがサンプリングされれば、確実に最上位にランクされるからである。

English

This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs. While a few studies hint at this possibility, none has clearly defined or demonstrated this phenomenon. We first propose a formal definition of knowledge, quantifying it for a given question as the fraction of correct-incorrect answer pairs where the correct one is ranked higher. This gives rise to external and internal knowledge, depending on the information used to score individual answer candidates: either the model's observable token-level probabilities or its intermediate computations. Hidden knowledge arises when internal knowledge exceeds external knowledge. We then present a case study, applying this framework to three popular open-weights LLMs in a closed-book QA setup. Our results indicate that: (1) LLMs consistently encode more factual knowledge internally than what they express externally, with an average gap of 40%. (2) Surprisingly, some knowledge is so deeply hidden that a model can internally know an answer perfectly, yet fail to generate it even once, despite large-scale repeated sampling of 1,000 answers. This reveals fundamental limitations in the generation capabilities of LLMs, which (3) puts a practical constraint on scaling test-time compute via repeated answer sampling in closed-book QA: significant performance improvements remain inaccessible because some answers are practically never sampled, yet if they were, we would be guaranteed to rank them first.

Inside-Out: 大規模言語モデルに内在する事実知識

Inside-Out: Hidden Factual Knowledge in LLMs

要旨

Support