ChatPaper.aiChatPaper

大型語言模型內部表示中的標記幾何学

The Geometry of Tokens in Internal Representations of Large Language Models

January 17, 2025
作者: Karthik Viswanathan, Yuri Gardinazzi, Giada Panerai, Alberto Cazzaniga, Matteo Biagetti
cs.AI

摘要

我們研究了在Transformer模型中,token嵌入的幾何形狀與其在下一個token預測中的角色之間的關係。這種關聯的一個重要方面使用了實驗性測度的概念,該測度編碼了token點雲在Transformer層之間的分佈,並推動了token表示在均場交互圖像中的演變。我們使用內在維度、鄰域重疊和餘弦相似性等指標,觀察地探測這些實驗性測度在各層之間的情況。為了驗證我們的方法,我們將這些指標與一組tokens被打亂的數據集進行比較,這擾亂了句法和語義結構。我們的研究結果顯示了token嵌入的幾何特性與下一個token預測的交叉熵損失之間的相關性,這意味著損失值較高的提示具有在更高維空間中表示的tokens。
English
We investigate the relationship between the geometry of token embeddings and their role in the next token prediction within transformer models. An important aspect of this connection uses the notion of empirical measure, which encodes the distribution of token point clouds across transformer layers and drives the evolution of token representations in the mean-field interacting picture. We use metrics such as intrinsic dimension, neighborhood overlap, and cosine similarity to observationally probe these empirical measures across layers. To validate our approach, we compare these metrics to a dataset where the tokens are shuffled, which disrupts the syntactic and semantic structure. Our findings reveal a correlation between the geometric properties of token embeddings and the cross-entropy loss of next token predictions, implying that prompts with higher loss values have tokens represented in higher-dimensional spaces.

Summary

AI-Generated Summary

PDF92January 22, 2025