通過重複的例子產生的新興特性

摘要

我們研究了使用演算法生成的資料集中的訓練範例重複次數作為變數時，Transformer 模型的表現。在三個數學問題上：最大公因數、模數乘法和矩陣特徵值，我們發現對於固定的訓練步驟數，以較小重複範例集訓練的模型表現優於以較大單次使用範例集訓練的模型。我們也證明了雙集訓練 - 對一小部分隨機範例重複使用，同時對訓練集中其餘範例進行正常取樣 - 能夠提供更快的學習速度和更好的表現。這凸顯了重複的好處可能超過數據多樣性的好處。這些資料集和問題提供了一個受控環境，以闡明深度學習中泛化和記憶之間仍不甚了解的相互作用。

English

We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that two-set training - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning.

通過重複的例子產生的新興特性

Emergent properties with repeated examples

摘要

Support