大型語言模型的地理空間機制可解釋性
Geospatial Mechanistic Interpretability of Large Language Models
May 6, 2025
作者: Stef De Sabbata, Stefano Mizzaro, Kevin Roitero
cs.AI
摘要
大型語言模型(LLMs)在多種自然語言處理任務中展現了前所未有的能力。它們處理和生成可行文本與代碼的能力,使其在許多領域中無處不在,而作為知識庫和“推理”工具的部署仍是一個持續研究的領域。在地理學中,越來越多的文獻專注於評估LLMs的地理知識及其執行空間推理的能力。然而,對於這些模型的內部運作,尤其是它們如何處理地理信息,我們仍知之甚少。
在本章中,我們建立了一個新穎的框架,用於研究地理空間機制可解釋性——利用空間分析來逆向工程LLMs如何處理地理信息。我們的目標是增進對這些複雜模型在處理地理信息時生成的內部表示的理解——如果這樣的表述不算是過度擬人化的話,可以稱之為“LLMs如何思考地理信息”。
我們首先概述了探針在揭示LLMs內部結構中的應用。接著,我們介紹了機制可解釋性這一領域,討論了疊加假說以及稀疏自編碼器在將LLMs的多義內部表示分解為更可解釋的單義特徵中的作用。在我們的實驗中,我們使用空間自相關來展示地名特徵如何顯示與其地理位置相關的空間模式,從而可以從地理空間角度進行解釋,提供這些模型如何處理地理信息的洞見。最後,我們討論了我們的框架如何有助於塑造地理學中基礎模型的研究與應用。
English
Large Language Models (LLMs) have demonstrated unprecedented capabilities
across various natural language processing tasks. Their ability to process and
generate viable text and code has made them ubiquitous in many fields, while
their deployment as knowledge bases and "reasoning" tools remains an area of
ongoing research. In geography, a growing body of literature has been focusing
on evaluating LLMs' geographical knowledge and their ability to perform spatial
reasoning. However, very little is still known about the internal functioning
of these models, especially about how they process geographical information.
In this chapter, we establish a novel framework for the study of geospatial
mechanistic interpretability - using spatial analysis to reverse engineer how
LLMs handle geographical information. Our aim is to advance our understanding
of the internal representations that these complex models generate while
processing geographical information - what one might call "how LLMs think about
geographic information" if such phrasing was not an undue anthropomorphism.
We first outline the use of probing in revealing internal structures within
LLMs. We then introduce the field of mechanistic interpretability, discussing
the superposition hypothesis and the role of sparse autoencoders in
disentangling polysemantic internal representations of LLMs into more
interpretable, monosemantic features. In our experiments, we use spatial
autocorrelation to show how features obtained for placenames display spatial
patterns related to their geographic location and can thus be interpreted
geospatially, providing insights into how these models process geographical
information. We conclude by discussing how our framework can help shape the
study and use of foundation models in geography.Summary
AI-Generated Summary