通过重复示例产生的新属性

摘要

我们研究了变压器的性能，其性能与通过算法生成的数据集中训练示例的重复次数有关。在数学的三个问题上：最大公约数、模乘法和矩阵特征值，我们表明对于固定数量的训练步骤，模型在重复示例较少的训练集上表现优于在单次使用示例较多的训练集上训练的模型。我们还证明，两集训练 - 对小随机子集示例的重复使用，以及对训练集的其余部分进行正常抽样 - 提供了更快的学习和更好的性能。这突显了重复的好处可能超过数据多样性的好处。这些数据集和问题提供了一个受控环境，以阐明深度学习中广义化和记忆之间仍然不太清楚的相互作用。

English

We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that two-set training - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning.

通过重复示例产生的新属性

Emergent properties with repeated examples

摘要

Support