趋同演化：不同语言模型如何习得相似的数字表征

摘要

在自然文本上训练的语言模型能够通过周期性特征来表征数字，其主导周期为T=2、5、10。本文发现这些特征存在双层结构：虽然以不同方式训练的Transformer、线性RNN、LSTM和经典词嵌入模型都能学习到傅里叶域中具有周期T尖峰的特征，但只有部分模型能学习到可用于对数字模T进行线性分类的几何可分特征。为解释这种不一致性，我们证明了傅里叶域稀疏性是模T几何可分的必要条件而非充分条件。通过实证研究，我们探究了模型训练产生几何可分特征的条件，发现数据、架构、优化器和分词器都起着关键作用。特别指出，我们识别出模型获得几何可分特征的两条路径：既可以从通用语言数据中的互补共现信号（包括文本-数字共现和跨数字交互）中学习，也可以通过多token（而非单token）加法问题习得。总体而言，我们的研究结果揭示了特征学习中趋同进化现象：不同类型的模型能够从不同的训练信号中学习到相似的特征。

English

Language models trained on natural text learn to represent numbers using periodic features with dominant periods at T=2, 5, 10. In this paper, we identify a two-tiered hierarchy of these features: while Transformers, Linear RNNs, LSTMs, and classical word embeddings trained in different ways all learn features that have period-T spikes in the Fourier domain, only some learn geometrically separable features that can be used to linearly classify a number mod-T. To explain this incongruity, we prove that Fourier domain sparsity is necessary but not sufficient for mod-T geometric separability. Empirically, we investigate when model training yields geometrically separable features, finding that the data, architecture, optimizer, and tokenizer all play key roles. In particular, we identify two different routes through which models can acquire geometrically separable features: they can learn them from complementary co-occurrence signals in general language data, including text-number co-occurrence and cross-number interaction, or from multi-token (but not single-token) addition problems. Overall, our results highlight the phenomenon of convergent evolution in feature learning: A diverse range of models learn similar features from different training signals.

趋同演化：不同语言模型如何习得相似的数字表征

Convergent Evolution: How Different Language Models Learn Similar Number Representations

摘要

Support