趋同演化：不同语言模型如何习得相似的数字表征

摘要

基于自然文本训练的语言模型能够通过周期特征来表征数字，其主导周期为T=2、5、10。本文发现这些特征存在双层结构：虽然Transformer、线性RNN、LSTM以及通过不同方式训练的经典词嵌入模型都能学习到傅里叶域中具有周期T尖峰的特征，但只有部分模型能学习到可用于线性分类数字模T的几何可分特征。为解释这种不一致性，我们证明了傅里叶域稀疏性虽是模T几何可分性的必要条件，但并非充分条件。通过实证研究，我们探索了模型训练产生几何可分特征的条件，发现数据、架构、优化器和分词器都起着关键作用。特别地，我们识别出模型获得几何可分特征的两条途径：既可以通过通用语言数据中的互补共现信号（包括文本-数字共现和跨数字交互）学习，也可以通过多令牌（而非单令牌）加法问题学习。总体而言，我们的研究结果揭示了特征学习中趋同进化现象：不同模型能够从各异的训练信号中学习到相似的特征。

English

Language models trained on natural text learn to represent numbers using periodic features with dominant periods at T=2, 5, 10. In this paper, we identify a two-tiered hierarchy of these features: while Transformers, Linear RNNs, LSTMs, and classical word embeddings trained in different ways all learn features that have period-T spikes in the Fourier domain, only some learn geometrically separable features that can be used to linearly classify a number mod-T. To explain this incongruity, we prove that Fourier domain sparsity is necessary but not sufficient for mod-T geometric separability. Empirically, we investigate when model training yields geometrically separable features, finding that the data, architecture, optimizer, and tokenizer all play key roles. In particular, we identify two different routes through which models can acquire geometrically separable features: they can learn them from complementary co-occurrence signals in general language data, including text-number co-occurrence and cross-number interaction, or from multi-token (but not single-token) addition problems. Overall, our results highlight the phenomenon of convergent evolution in feature learning: A diverse range of models learn similar features from different training signals.