大规模语言模型中的认知多样性与知识坍缩

摘要

大型语言模型（LLMs）倾向于生成在词汇、语义及风格上同质化的文本。这种现象可能导致知识塌缩，即随着时间的推移，同质化的LLMs会缩小可获取信息的范围。现有关于同质化的研究多局限于封闭式选择题设置或模糊的语义特征，且未考察跨时间和文化背景的趋势。为克服这一局限，我们提出了一种新方法来衡量认知多样性，即LLM输出中关于现实世界主张的差异性，并借此对LLM知识塌缩进行了广泛的实证研究。我们测试了27个LLMs，涵盖12个国家的155个主题，以及源自真实用户对话的200种提示变体。研究表明，尽管新模型倾向于生成更多样化的主张，但几乎所有模型的认知多样性均不及基本的网络搜索。我们发现，模型规模对认知多样性有负面影响，而检索增强生成（RAG）则具有正面影响，尽管RAG带来的改善程度因文化背景而异。最后，与传统知识源（如维基百科）相比，特定国家的主张更多地反映了英语而非当地语言，凸显了认知代表性上的差距。

English

Large language models (LLMs) tend to generate lexically, semantically, and stylistically homogenous texts. This poses a risk of knowledge collapse, where homogenous LLMs mediate a shrinking in the range of accessible information over time. Existing works on homogenization are limited by a focus on closed-ended multiple-choice setups or fuzzy semantic features, and do not look at trends across time and cultural contexts. To overcome this, we present a new methodology to measure epistemic diversity, i.e., variation in real-world claims in LLM outputs, which we use to perform a broad empirical study of LLM knowledge collapse. We test 27 LLMs, 155 topics covering 12 countries, and 200 prompt variations sourced from real user chats. For the topics in our study, we show that while newer models tend to generate more diverse claims, nearly all models are less epistemically diverse than a basic web search. We find that model size has a negative impact on epistemic diversity, while retrieval-augmented generation (RAG) has a positive impact, though the improvement from RAG varies by the cultural context. Finally, compared to a traditional knowledge source (Wikipedia), we find that country-specific claims reflect the English language more than the local one, highlighting a gap in epistemic representation

大规模语言模型中的认知多样性与知识坍缩

Epistemic Diversity and Knowledge Collapse in Large Language Models

摘要

Support