大语言模型的地理空间机制可解释性

摘要

大型语言模型（LLMs）在各类自然语言处理任务中展现了前所未有的能力。它们处理和生成有效文本与代码的能力使其在众多领域无处不在，而作为知识库和“推理”工具的部署仍是持续研究的焦点。在地理学领域，越来越多的文献聚焦于评估LLMs的地理知识及其执行空间推理的能力。然而，关于这些模型内部运作机制，尤其是它们如何处理地理信息，我们仍知之甚少。在本章中，我们建立了一个研究地理空间机制可解释性的新框架——利用空间分析逆向工程LLMs处理地理信息的方式。我们的目标是深化对这些复杂模型在处理地理信息时生成的内部表征的理解——如果这样的表述不带有过度拟人化色彩，可以说成“LLMs如何思考地理信息”。首先，我们概述了探测技术在揭示LLMs内部结构中的应用。随后，我们引入了机制可解释性领域，讨论了叠加假说以及稀疏自编码器在将LLMs的多义性内部表征解耦为更可解释的单义性特征中的作用。在实验中，我们运用空间自相关展示了地名特征如何展现出与其地理位置相关的空间模式，从而能够从地理空间角度进行解释，为这些模型处理地理信息的方式提供了洞见。最后，我们讨论了这一框架如何助力地理学中基础模型的研究与应用。

English

Large Language Models (LLMs) have demonstrated unprecedented capabilities across various natural language processing tasks. Their ability to process and generate viable text and code has made them ubiquitous in many fields, while their deployment as knowledge bases and "reasoning" tools remains an area of ongoing research. In geography, a growing body of literature has been focusing on evaluating LLMs' geographical knowledge and their ability to perform spatial reasoning. However, very little is still known about the internal functioning of these models, especially about how they process geographical information. In this chapter, we establish a novel framework for the study of geospatial mechanistic interpretability - using spatial analysis to reverse engineer how LLMs handle geographical information. Our aim is to advance our understanding of the internal representations that these complex models generate while processing geographical information - what one might call "how LLMs think about geographic information" if such phrasing was not an undue anthropomorphism. We first outline the use of probing in revealing internal structures within LLMs. We then introduce the field of mechanistic interpretability, discussing the superposition hypothesis and the role of sparse autoencoders in disentangling polysemantic internal representations of LLMs into more interpretable, monosemantic features. In our experiments, we use spatial autocorrelation to show how features obtained for placenames display spatial patterns related to their geographic location and can thus be interpreted geospatially, providing insights into how these models process geographical information. We conclude by discussing how our framework can help shape the study and use of foundation models in geography.