大規模言語モデルにおける認識的多様性と知識崩壊

要旨

大規模言語モデル（LLM）は、語彙的、意味的、および文体的に均質なテキストを生成する傾向がある。これにより、均質化したLLMが時間の経過とともにアクセス可能な情報の範囲を縮小させる「知識の崩壊」のリスクが生じる。既存の均質化に関する研究は、閉じた形式の多肢選択問題や曖昧な意味的特徴に焦点を当てたものであり、時間や文化的文脈にわたるトレンドを検討していない。これを克服するため、我々は認識論的多様性、すなわちLLMの出力における現実世界の主張の変異を測定する新しい方法論を提案し、LLMの知識崩壊に関する広範な実証研究を行う。27のLLM、12か国をカバーする155のトピック、および実際のユーザーチャットから収集した200のプロンプト変種をテストした。研究対象のトピックにおいて、新しいモデルはより多様な主張を生成する傾向があるものの、ほぼ全てのモデルが基本的なウェブ検索よりも認識論的多様性が低いことを示した。モデルのサイズは認識論的多様性に負の影響を与える一方、検索拡張生成（RAG）は正の影響を与えるが、RAGによる改善は文化的文脈によって異なることがわかった。最後に、伝統的な知識源（Wikipedia）と比較すると、国固有の主張は現地語よりも英語を反映していることが明らかとなり、認識論的表現におけるギャップが浮き彫りになった。

English

Large language models (LLMs) tend to generate lexically, semantically, and stylistically homogenous texts. This poses a risk of knowledge collapse, where homogenous LLMs mediate a shrinking in the range of accessible information over time. Existing works on homogenization are limited by a focus on closed-ended multiple-choice setups or fuzzy semantic features, and do not look at trends across time and cultural contexts. To overcome this, we present a new methodology to measure epistemic diversity, i.e., variation in real-world claims in LLM outputs, which we use to perform a broad empirical study of LLM knowledge collapse. We test 27 LLMs, 155 topics covering 12 countries, and 200 prompt variations sourced from real user chats. For the topics in our study, we show that while newer models tend to generate more diverse claims, nearly all models are less epistemically diverse than a basic web search. We find that model size has a negative impact on epistemic diversity, while retrieval-augmented generation (RAG) has a positive impact, though the improvement from RAG varies by the cultural context. Finally, compared to a traditional knowledge source (Wikipedia), we find that country-specific claims reflect the English language more than the local one, highlighting a gap in epistemic representation

大規模言語モデルにおける認識的多様性と知識崩壊

Epistemic Diversity and Knowledge Collapse in Large Language Models

要旨

Support