大规模语言模型中的认知多样性与知识坍缩
Epistemic Diversity and Knowledge Collapse in Large Language Models
October 5, 2025
作者: Dustin Wright, Sarah Masud, Jared Moore, Srishti Yadav, Maria Antoniak, Chan Young Park, Isabelle Augenstein
cs.AI
摘要
大型语言模型(LLMs)倾向于生成在词汇、语义及风格上同质化的文本。这种现象可能导致知识塌缩,即随着时间的推移,同质化的LLMs会缩小可获取信息的范围。现有关于同质化的研究多局限于封闭式选择题设置或模糊的语义特征,且未考察跨时间和文化背景的趋势。为克服这一局限,我们提出了一种新方法来衡量认知多样性,即LLM输出中关于现实世界主张的差异性,并借此对LLM知识塌缩进行了广泛的实证研究。我们测试了27个LLMs,涵盖12个国家的155个主题,以及源自真实用户对话的200种提示变体。研究表明,尽管新模型倾向于生成更多样化的主张,但几乎所有模型的认知多样性均不及基本的网络搜索。我们发现,模型规模对认知多样性有负面影响,而检索增强生成(RAG)则具有正面影响,尽管RAG带来的改善程度因文化背景而异。最后,与传统知识源(如维基百科)相比,特定国家的主张更多地反映了英语而非当地语言,凸显了认知代表性上的差距。
English
Large language models (LLMs) tend to generate lexically, semantically, and
stylistically homogenous texts. This poses a risk of knowledge collapse, where
homogenous LLMs mediate a shrinking in the range of accessible information over
time. Existing works on homogenization are limited by a focus on closed-ended
multiple-choice setups or fuzzy semantic features, and do not look at trends
across time and cultural contexts. To overcome this, we present a new
methodology to measure epistemic diversity, i.e., variation in real-world
claims in LLM outputs, which we use to perform a broad empirical study of LLM
knowledge collapse. We test 27 LLMs, 155 topics covering 12 countries, and 200
prompt variations sourced from real user chats. For the topics in our study, we
show that while newer models tend to generate more diverse claims, nearly all
models are less epistemically diverse than a basic web search. We find that
model size has a negative impact on epistemic diversity, while
retrieval-augmented generation (RAG) has a positive impact, though the
improvement from RAG varies by the cultural context. Finally, compared to a
traditional knowledge source (Wikipedia), we find that country-specific claims
reflect the English language more than the local one, highlighting a gap in
epistemic representation