通过重复示例产生的新属性
Emergent properties with repeated examples
October 9, 2024
作者: François Charton, Julia Kempe
cs.AI
摘要
我们研究了变压器的性能,其性能与通过算法生成的数据集中训练示例的重复次数有关。在数学的三个问题上:最大公约数、模乘法和矩阵特征值,我们表明对于固定数量的训练步骤,模型在重复示例较少的训练集上表现优于在单次使用示例较多的训练集上训练的模型。我们还证明,两集训练 - 对小随机子集示例的重复使用,以及对训练集的其余部分进行正常抽样 - 提供了更快的学习和更好的性能。这突显了重复的好处可能超过数据多样性的好处。这些数据集和问题提供了一个受控环境,以阐明深度学习中广义化和记忆之间仍然不太清楚的相互作用。
English
We study the performance of transformers as a function of the number of
repetitions of training examples with algorithmically generated datasets. On
three problems of mathematics: the greatest common divisor, modular
multiplication, and matrix eigenvalues, we show that for a fixed number of
training steps, models trained on smaller sets of repeated examples outperform
models trained on larger sets of single-use examples. We also demonstrate that
two-set training - repeated use of a small random subset of examples, along
normal sampling on the rest of the training set - provides for faster learning
and better performance. This highlights that the benefits of repetition can
outweigh those of data diversity. These datasets and problems provide a
controlled setting to shed light on the still poorly understood interplay
between generalization and memorization in deep learning.Summary
AI-Generated Summary