通過重複的例子產生的新興特性
Emergent properties with repeated examples
October 9, 2024
作者: François Charton, Julia Kempe
cs.AI
摘要
我們研究了使用演算法生成的資料集中的訓練範例重複次數作為變數時,Transformer 模型的表現。在三個數學問題上:最大公因數、模數乘法和矩陣特徵值,我們發現對於固定的訓練步驟數,以較小重複範例集訓練的模型表現優於以較大單次使用範例集訓練的模型。我們也證明了雙集訓練 - 對一小部分隨機範例重複使用,同時對訓練集中其餘範例進行正常取樣 - 能夠提供更快的學習速度和更好的表現。這凸顯了重複的好處可能超過數據多樣性的好處。這些資料集和問題提供了一個受控環境,以闡明深度學習中泛化和記憶之間仍不甚了解的相互作用。
English
We study the performance of transformers as a function of the number of
repetitions of training examples with algorithmically generated datasets. On
three problems of mathematics: the greatest common divisor, modular
multiplication, and matrix eigenvalues, we show that for a fixed number of
training steps, models trained on smaller sets of repeated examples outperform
models trained on larger sets of single-use examples. We also demonstrate that
two-set training - repeated use of a small random subset of examples, along
normal sampling on the rest of the training set - provides for faster learning
and better performance. This highlights that the benefits of repetition can
outweigh those of data diversity. These datasets and problems provide a
controlled setting to shed light on the still poorly understood interplay
between generalization and memorization in deep learning.Summary
AI-Generated Summary